You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Rishikesh Gawade <ri...@gmail.com> on 2018/04/15 19:12:09 UTC

ERROR: Hive on Spark

Hello there. I am a newbie in the world of Spark. I have been working on a
Spark Project using Java.
I have configured Hive and Spark to run on Hadoop.
As of now i have created a Hive (derby) database on Hadoop HDFS at the
given location(warehouse location): */user/hive/warehouse *and database
name as : *spam *(saved as *spam.db* at the aforementioned location).
I have been trying to read tables in this database in spark to create
RDDs/DataFrames.
Could anybody please guide me in how I can achieve this?
I used the following statements in my Java Code:

SparkSession spark = SparkSession
        .builder()
        .appName("Java Spark Hive Example").master("yarn")
        .config("spark.sql.warehouse.dir","/user/hive/warehouse")
        .enableHiveSupport()
        .getOrCreate();
spark.sql("USE spam");
spark.sql("SELECT * FROM spamdataset").show();

After this i built the project using Maven as follows: mvn clean package
-DskipTests and a JAR was generated.

After this, I tried running the project via spark-submit CLI using :

spark-submit --class com.adbms.SpamFilter --master yarn
~/IdeaProjects/mlproject/target/mlproject-1.0-SNAPSHOT.jar

and got the following error:

Exception in thread "main"
org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException:
Database 'spam' not found;
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.org$
apache$spark$sql$catalyst$catalog$SessionCatalog$$requireDbExists(
SessionCatalog.scala:174)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.setCurrentDatabase(
SessionCatalog.scala:256)
at org.apache.spark.sql.execution.command.SetDatabaseCommand.run(
databases.scala:59)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.
sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.
sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.
executeCollect(commands.scala:79)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3253)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(
SQLExecution.scala:77)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3252)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:190)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638)
at com.adbms.SpamFilter.main(SpamFilter.java:54)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(
NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(
SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$
deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Also, I replaced the SQL query with "SHOW DATABASES", and it showed only
one database namely "default". Those stored on HDFS warehouse dir weren't
shown.

I request you to please check this and if anything is wrong then please
suggest an ideal way to read Hive tables on Hadoop in Spark using Java. A
link to a webpage having relevant info would also be appreciated.
Thank you in anticipation.
Regards,
Rishikesh Gawade

Re: ERROR: Hive on Spark

Posted by naresh Goud <na...@gmail.com>.

Change you table name in query to spam.spamdataset instead of spamdataset.

On Sun, Apr 15, 2018 at 2:12 PM Rishikesh Gawade <ri...@gmail.com>
wrote:

> Hello there. I am a newbie in the world of Spark. I have been working on a
> Spark Project using Java.
> I have configured Hive and Spark to run on Hadoop.
> As of now i have created a Hive (derby) database on Hadoop HDFS at the
> given location(warehouse location): */user/hive/warehouse *and database
> name as : *spam *(saved as *spam.db* at the aforementioned location).
> I have been trying to read tables in this database in spark to create
> RDDs/DataFrames.
> Could anybody please guide me in how I can achieve this?
> I used the following statements in my Java Code:
>
> SparkSession spark = SparkSession
>         .builder()
>         .appName("Java Spark Hive Example").master("yarn")
>         .config("spark.sql.warehouse.dir","/user/hive/warehouse")
>         .enableHiveSupport()
>         .getOrCreate();
> spark.sql("USE spam");
> spark.sql("SELECT * FROM spamdataset").show();
>
> After this i built the project using Maven as follows: mvn clean package
> -DskipTests and a JAR was generated.
>
> After this, I tried running the project via spark-submit CLI using :
>
> spark-submit --class com.adbms.SpamFilter --master yarn
> ~/IdeaProjects/mlproject/target/mlproject-1.0-SNAPSHOT.jar
>
> and got the following error:
>
> Exception in thread "main"
> org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database
> 'spam' not found;
> at org.apache.spark.sql.catalyst.catalog.SessionCatalog.org
> $apache$spark$sql$catalyst$catalog$SessionCatalog$$requireDbExists(SessionCatalog.scala:174)
> at
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.setCurrentDatabase(SessionCatalog.scala:256)
> at
> org.apache.spark.sql.execution.command.SetDatabaseCommand.run(databases.scala:59)
> at
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
> at
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
> at
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
> at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
> at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
> at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3253)
> at
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
> at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3252)
> at org.apache.spark.sql.Dataset.<init>(Dataset.scala:190)
> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75)
> at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638)
> at com.adbms.SpamFilter.main(SpamFilter.java:54)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
> at
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
> Also, I replaced the SQL query with "SHOW DATABASES", and it showed only
> one database namely "default". Those stored on HDFS warehouse dir weren't
> shown.
>
> I request you to please check this and if anything is wrong then please
> suggest an ideal way to read Hive tables on Hadoop in Spark using Java. A
> link to a webpage having relevant info would also be appreciated.
> Thank you in anticipation.
> Regards,
> Rishikesh Gawade
>
> --
Thanks,
Naresh
www.linkedin.com/in/naresh-dulam
http://hadoopandspark.blogspot.com/