You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Doug Balog <do...@dugos.com> on 2016/05/19 20:56:08 UTC
Possible Hive problem with Spark 2.0.0 preview.
I haven’t had time to really look into this problem, but I want to mention it.
I downloaded http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-preview-bin/spark-2.0.0-preview-bin-hadoop2.7.tgz
and tried to run it against our Secure Hadoop cluster and access a Hive table.
1. “val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)” doesn’t work because “HiveContext not a member of org.apache.spark.sql.hive” I checked the documentation, and it looks like it should still work for spark-2.0.0-preview-bin-hadoop2.7.tgz
2. I also tried the new spark session, ‘spark.table(“db.table”)’, it fails with a HDFS permission denied can’t write to “/user/hive/warehouse”
Is there a new config option that I missed ?
I tried a SNAPSHOT version, downloaded from Patricks apache’s dir from Apr 26th, that worked the way I expected.
I’m going to go through the commits and see which one broke the change, but my builds are not running (no such method ConcurrentHashMap.keySet()) so I have to fix that problem first.
Thanks for any hints.
Doug
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org
Re: Possible Hive problem with Spark 2.0.0 preview.
Posted by Doug Balog <do...@dugos.com>.
Some more info I’m still digging.
I’m just trying to do `spark.table(“db.table”).count`from a spark-shell
“db.table” is just a hive table.
At commit b67668b this worked just fine and it returned the number of rows in db.table.
Starting at ca99171 "[SPARK-15073][SQL] Hide SparkSession constructor from the public” it fails with
org.apache.spark.sql.AnalysisException: Database ‘db' does not exist;
at org.apache.spark.sql.catalyst.catalog.ExternalCatalog.requireDbExists(ExternalCatalog.scala:37)
at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.tableExists(InMemoryCatalog.scala:195)
at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.requireTableExists(InMemoryCatalog.scala:63)
at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.getTable(InMemoryCatalog.scala:186)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupRelation(SessionCatalog.scala:337)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:524)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:520)
... 48 elided
If I run "org.apache.spark.sql.SparkSession.builder.enableHiveSupport.getOrCreate.catalog.listDatabases.show(False)”
+-------------------------------------------------------------------------------------------------------------------------------------------------+-----------+-----------+
|name |description|locationUri|
+-------------------------------------------------------------------------------------------------------------------------------------------------+-----------+-----------+
|Database[name='default', description='default database', path='hdfs://ns/{CWD}/spark-warehouse']|
+-------------------------------------------------------------------------------------------------------------------------------------------------+-----------+—————+
Where CWD is the current working directory of where I started my spark-shell.
It looks like this commit causes spark.catalog to be the internal one instead of the Hive one.
Michael, I dont this this is related to the HDFS configurations, they are in /etc/hadoop/conf on each of the nodes in the cluster.
Arun, I was referring to these docs, http://home.apache.org/~pwendell/spark-releases/spark-2.0.0-preview-docs/sql-programming-guide.html they need to be updated to no refer to HiveContext.
I don’t think HiveContext should be marked as private[Hive], it should be public.
I’ll keep digging.
Doug
> On May 19, 2016, at 6:52 PM, Reynold Xin <rx...@databricks.com> wrote:
>
> The old one is deprecated but should still work though.
>
>
> On Thu, May 19, 2016 at 3:51 PM, Arun Allamsetty <ar...@gmail.com> wrote:
> Hi Doug,
>
> If you look at the API docs here: http://home.apache.org/~pwendell/spark-releases/spark-2.0.0-preview-docs/api/scala/index.html#org.apache.spark.sql.hive.HiveContext, you'll see
> Deprecate (Since version 2.0.0) Use SparkSession.builder.enableHiveSupport instead
> So you probably need to use that.
>
> Arun
>
> On Thu, May 19, 2016 at 3:44 PM, Michael Armbrust <mi...@databricks.com> wrote:
> 1. “val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)” doesn’t work because “HiveContext not a member of org.apache.spark.sql.hive” I checked the documentation, and it looks like it should still work for spark-2.0.0-preview-bin-hadoop2.7.tgz
>
> HiveContext has been deprecated and moved to a 1.x compatibility package, which you'll need to include explicitly. Docs have not been updated yet.
>
> 2. I also tried the new spark session, ‘spark.table(“db.table”)’, it fails with a HDFS permission denied can’t write to “/user/hive/warehouse”
>
> Where are the HDFS configurations located? We might not be propagating them correctly any more.
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org
Re: Possible Hive problem with Spark 2.0.0 preview.
Posted by Reynold Xin <rx...@databricks.com>.
The old one is deprecated but should still work though.
On Thu, May 19, 2016 at 3:51 PM, Arun Allamsetty <ar...@gmail.com>
wrote:
> Hi Doug,
>
> If you look at the API docs here:
> http://home.apache.org/~pwendell/spark-releases/spark-2.0.0-preview-docs/api/scala/index.html#org.apache.spark.sql.hive.HiveContext,
> you'll see
> Deprecate* (Since version 2.0.0)* Use
> SparkSession.builder.enableHiveSupport instead
> So you probably need to use that.
>
> Arun
>
> On Thu, May 19, 2016 at 3:44 PM, Michael Armbrust <mi...@databricks.com>
> wrote:
>
>> 1. “val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)”
>>> doesn’t work because “HiveContext not a member of
>>> org.apache.spark.sql.hive” I checked the documentation, and it looks like
>>> it should still work for spark-2.0.0-preview-bin-hadoop2.7.tgz
>>>
>>
>> HiveContext has been deprecated and moved to a 1.x compatibility package,
>> which you'll need to include explicitly. Docs have not been updated yet.
>>
>>
>>> 2. I also tried the new spark session, ‘spark.table(“db.table”)’, it
>>> fails with a HDFS permission denied can’t write to “/user/hive/warehouse”
>>>
>>
>> Where are the HDFS configurations located? We might not be propagating
>> them correctly any more.
>>
>
>
Re: Possible Hive problem with Spark 2.0.0 preview.
Posted by Arun Allamsetty <ar...@gmail.com>.
Hi Doug,
If you look at the API docs here:
http://home.apache.org/~pwendell/spark-releases/spark-2.0.0-preview-docs/api/scala/index.html#org.apache.spark.sql.hive.HiveContext,
you'll see
Deprecate* (Since version 2.0.0)* Use
SparkSession.builder.enableHiveSupport instead
So you probably need to use that.
Arun
On Thu, May 19, 2016 at 3:44 PM, Michael Armbrust <mi...@databricks.com>
wrote:
> 1. “val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)”
>> doesn’t work because “HiveContext not a member of
>> org.apache.spark.sql.hive” I checked the documentation, and it looks like
>> it should still work for spark-2.0.0-preview-bin-hadoop2.7.tgz
>>
>
> HiveContext has been deprecated and moved to a 1.x compatibility package,
> which you'll need to include explicitly. Docs have not been updated yet.
>
>
>> 2. I also tried the new spark session, ‘spark.table(“db.table”)’, it
>> fails with a HDFS permission denied can’t write to “/user/hive/warehouse”
>>
>
> Where are the HDFS configurations located? We might not be propagating
> them correctly any more.
>
Re: Possible Hive problem with Spark 2.0.0 preview.
Posted by Michael Armbrust <mi...@databricks.com>.
>
> 1. “val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)”
> doesn’t work because “HiveContext not a member of
> org.apache.spark.sql.hive” I checked the documentation, and it looks like
> it should still work for spark-2.0.0-preview-bin-hadoop2.7.tgz
>
HiveContext has been deprecated and moved to a 1.x compatibility package,
which you'll need to include explicitly. Docs have not been updated yet.
> 2. I also tried the new spark session, ‘spark.table(“db.table”)’, it fails
> with a HDFS permission denied can’t write to “/user/hive/warehouse”
>
Where are the HDFS configurations located? We might not be propagating
them correctly any more.