You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Michael Armbrust <mi...@databricks.com> on 2016/05/19 21:44:38 UTC

Re: Possible Hive problem with Spark 2.0.0 preview.

>
> 1. “val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)”
> doesn’t work because “HiveContext not a member of
> org.apache.spark.sql.hive”  I checked the documentation, and it looks like
> it should still work for spark-2.0.0-preview-bin-hadoop2.7.tgz
>

HiveContext has been deprecated and moved to a 1.x compatibility package,
which you'll need to include explicitly.  Docs have not been updated yet.


> 2. I also tried the new spark session, ‘spark.table(“db.table”)’, it fails
> with a HDFS permission denied can’t write to “/user/hive/warehouse”
>

Where are the HDFS configurations located?  We might not be propagating
them correctly any more.

Re: Possible Hive problem with Spark 2.0.0 preview.

Posted by Doug Balog <do...@dugos.com>.

Some more info I’m still digging.
I’m just trying to do  `spark.table(“db.table”).count`from a spark-shell
“db.table” is just a hive table.

At commit b67668b this worked just fine and it returned the number of rows in db.table.
Starting at ca99171  "[SPARK-15073][SQL] Hide SparkSession constructor from the public” it fails with 

org.apache.spark.sql.AnalysisException: Database ‘db' does not exist;
  at org.apache.spark.sql.catalyst.catalog.ExternalCatalog.requireDbExists(ExternalCatalog.scala:37)
  at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.tableExists(InMemoryCatalog.scala:195)
  at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.requireTableExists(InMemoryCatalog.scala:63)
  at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.getTable(InMemoryCatalog.scala:186)
  at org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupRelation(SessionCatalog.scala:337)
  at org.apache.spark.sql.SparkSession.table(SparkSession.scala:524)
  at org.apache.spark.sql.SparkSession.table(SparkSession.scala:520)
  ... 48 elided

If I run "org.apache.spark.sql.SparkSession.builder.enableHiveSupport.getOrCreate.catalog.listDatabases.show(False)”

+-------------------------------------------------------------------------------------------------------------------------------------------------+-----------+-----------+
|name                                                                                                                                             |description|locationUri|
+-------------------------------------------------------------------------------------------------------------------------------------------------+-----------+-----------+
|Database[name='default', description='default database', path='hdfs://ns/{CWD}/spark-warehouse']|
+-------------------------------------------------------------------------------------------------------------------------------------------------+-----------+—————+

 Where CWD is the current working directory of where I started my spark-shell.

It looks like this commit causes spark.catalog to be the internal one instead of the Hive one. 

Michael, I dont this this is related to the HDFS configurations, they are in /etc/hadoop/conf on each of the nodes in the cluster. 

Arun, I was referring to these docs, http://home.apache.org/~pwendell/spark-releases/spark-2.0.0-preview-docs/sql-programming-guide.html they need to be updated to no refer to HiveContext. 

I don’t think HiveContext should be marked as private[Hive], it should be public. 

I’ll keep digging.

Doug

> On May 19, 2016, at 6:52 PM, Reynold Xin <rx...@databricks.com> wrote:
> 
> The old one is deprecated but should still work though.
> 
> 
> On Thu, May 19, 2016 at 3:51 PM, Arun Allamsetty <ar...@gmail.com> wrote:
> Hi Doug,
> 
> If you look at the API docs here: http://home.apache.org/~pwendell/spark-releases/spark-2.0.0-preview-docs/api/scala/index.html#org.apache.spark.sql.hive.HiveContext, you'll see
> Deprecate (Since version 2.0.0) Use SparkSession.builder.enableHiveSupport instead
> So you probably need to use that.
> 
> Arun
> 
> On Thu, May 19, 2016 at 3:44 PM, Michael Armbrust <mi...@databricks.com> wrote:
> 1. “val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)”  doesn’t work because “HiveContext not a member of org.apache.spark.sql.hive”  I checked the documentation, and it looks like it should still work for spark-2.0.0-preview-bin-hadoop2.7.tgz
> 
> HiveContext has been deprecated and moved to a 1.x compatibility package, which you'll need to include explicitly.  Docs have not been updated yet.
>  
> 2. I also tried the new spark session, ‘spark.table(“db.table”)’, it fails with a HDFS permission denied can’t write to “/user/hive/warehouse”
> 
> Where are the HDFS configurations located?  We might not be propagating them correctly any more. 
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: Possible Hive problem with Spark 2.0.0 preview.

Posted by Reynold Xin <rx...@databricks.com>.

The old one is deprecated but should still work though.


On Thu, May 19, 2016 at 3:51 PM, Arun Allamsetty <ar...@gmail.com>
wrote:

> Hi Doug,
>
> If you look at the API docs here:
> http://home.apache.org/~pwendell/spark-releases/spark-2.0.0-preview-docs/api/scala/index.html#org.apache.spark.sql.hive.HiveContext,
> you'll see
> Deprecate* (Since version 2.0.0)* Use
> SparkSession.builder.enableHiveSupport instead
> So you probably need to use that.
>
> Arun
>
> On Thu, May 19, 2016 at 3:44 PM, Michael Armbrust <mi...@databricks.com>
> wrote:
>
>> 1. “val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)”
>>> doesn’t work because “HiveContext not a member of
>>> org.apache.spark.sql.hive”  I checked the documentation, and it looks like
>>> it should still work for spark-2.0.0-preview-bin-hadoop2.7.tgz
>>>
>>
>> HiveContext has been deprecated and moved to a 1.x compatibility package,
>> which you'll need to include explicitly.  Docs have not been updated yet.
>>
>>
>>> 2. I also tried the new spark session, ‘spark.table(“db.table”)’, it
>>> fails with a HDFS permission denied can’t write to “/user/hive/warehouse”
>>>
>>
>> Where are the HDFS configurations located?  We might not be propagating
>> them correctly any more.
>>
>
>

Re: Possible Hive problem with Spark 2.0.0 preview.

Posted by Arun Allamsetty <ar...@gmail.com>.

Hi Doug,

If you look at the API docs here:
http://home.apache.org/~pwendell/spark-releases/spark-2.0.0-preview-docs/api/scala/index.html#org.apache.spark.sql.hive.HiveContext,
you'll see
Deprecate* (Since version 2.0.0)* Use
SparkSession.builder.enableHiveSupport instead
So you probably need to use that.

Arun

On Thu, May 19, 2016 at 3:44 PM, Michael Armbrust <mi...@databricks.com>
wrote:

> 1. “val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)”
>> doesn’t work because “HiveContext not a member of
>> org.apache.spark.sql.hive”  I checked the documentation, and it looks like
>> it should still work for spark-2.0.0-preview-bin-hadoop2.7.tgz
>>
>
> HiveContext has been deprecated and moved to a 1.x compatibility package,
> which you'll need to include explicitly.  Docs have not been updated yet.
>
>
>> 2. I also tried the new spark session, ‘spark.table(“db.table”)’, it
>> fails with a HDFS permission denied can’t write to “/user/hive/warehouse”
>>
>
> Where are the HDFS configurations located?  We might not be propagating
> them correctly any more.
>