You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Stana <st...@is-land.com.tw> on 2016/03/10 12:11:45 UTC

Error in Hive on Spark

 I am trying out Hive on Spark with hive 2.0.0 and spark 1.4.1, and
executing org.apache.hadoop.hive.ql.Driver with java application.

Following are my situations:
1.Building spark 1.4.1 assembly jar without Hive .
2.Uploading the spark assembly jar to the hadoop cluster.
3.Executing the java application with eclipse IDE in my client computer.

The application went well and it submitted mr job to the yarn cluster
successfully when using " hiveConf.set("hive.execution.engine", "mr")
",but it threw exceptions in spark-engine.

Finally, i traced Hive source code and came to the conclusion:

In my situation, SparkClientImpl class will generate the spark-submit
shell and executed it.
The shell command allocated  --class with RemoteDriver.class.getName()
and jar with SparkContext.jarOfClass(this.getClass()).get(), so that
my application threw the exception.

Is it right? And how can I do to execute the application with
spark-engine successfully in my client computer ? Thanks a lot!


Java application code:

public class TestHiveDriver {

	private static HiveConf hiveConf;
	private static Driver driver;
	private static CliSessionState ss;
	public static void main(String[] args){

		String sql = "select * from hadoop0263_0 as a join hadoop0263_0 as b
on (a.key = b.key)";
		ss = new CliSessionState(new HiveConf(SessionState.class));
		hiveConf = new HiveConf(Driver.class);
		hiveConf.set("fs.default.name", "hdfs://storm0:9000");
		hiveConf.set("yarn.resourcemanager.address", "storm0:8032");
		hiveConf.set("yarn.resourcemanager.scheduler.address", "storm0:8030");
		hiveConf.set("yarn.resourcemanager.resource-tracker.address","storm0:8031");
		hiveConf.set("yarn.resourcemanager.admin.address", "storm0:8033");
		hiveConf.set("mapreduce.framework.name", "yarn");
		hiveConf.set("mapreduce.johistory.address", "storm0:10020");
		hiveConf.set("javax.jdo.option.ConnectionURL","jdbc:mysql://storm0:3306/stana_metastore");
		hiveConf.set("javax.jdo.option.ConnectionDriverName","com.mysql.jdbc.Driver");
		hiveConf.set("javax.jdo.option.ConnectionUserName", "root");
		hiveConf.set("javax.jdo.option.ConnectionPassword", "123456");
		hiveConf.setBoolean("hive.auto.convert.join",false);
		hiveConf.set("spark.yarn.jar",
"hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar");
		hiveConf.set("spark.home","target/spark");
		hiveConf.set("hive.execution.engine", "spark");
		hiveConf.set("hive.dbname", "default");


		driver = new Driver(hiveConf);
		SessionState.start(hiveConf);
		
		CommandProcessorResponse res = null;
		try {
			res = driver.run(sql);
		} catch (CommandNeedRetryException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}

		System.out.println("Response Code:" + res.getResponseCode());
		System.out.println("Error Message:" + res.getErrorMessage());
		System.out.println("SQL State:" + res.getSQLState());

	}
}




Exception of spark-engine:

16/03/10 18:32:58 INFO SparkClientImpl: Running client driver with
argv: /Volumes/Sdhd/Documents/project/island/java/apache/hive-200-test/hive-release-2.0.0/itests/hive-unit/target/spark/bin/spark-submit
--properties-file
/var/folders/vt/cjcdhms903x7brn1kbh558s40000gn/T/spark-submit.7697089826296920539.properties
--class org.apache.hive.spark.client.RemoteDriver
/Users/stana/.m2/repository/org/apache/hive/hive-exec/2.0.0/hive-exec-2.0.0.jar
--remote-host MacBook-Pro.local --remote-port 51331 --conf
hive.spark.client.connect.timeout=1000 --conf
hive.spark.client.server.connect.timeout=90000 --conf
hive.spark.client.channel.log.level=null --conf
hive.spark.client.rpc.max.size=52428800 --conf
hive.spark.client.rpc.threads=8 --conf
hive.spark.client.secret.bits=256
16/03/10 18:33:09 INFO SparkClientImpl: 16/03/10 18:33:09 INFO Client:
16/03/10 18:33:09 INFO SparkClientImpl: 	 client token: N/A
16/03/10 18:33:09 INFO SparkClientImpl: 	 diagnostics: N/A
16/03/10 18:33:09 INFO SparkClientImpl: 	 ApplicationMaster host: N/A
16/03/10 18:33:09 INFO SparkClientImpl: 	 ApplicationMaster RPC port: -1
16/03/10 18:33:09 INFO SparkClientImpl: 	 queue: default
16/03/10 18:33:09 INFO SparkClientImpl: 	 start time: 1457180833494
16/03/10 18:33:09 INFO SparkClientImpl: 	 final status: UNDEFINED
16/03/10 18:33:09 INFO SparkClientImpl: 	 tracking URL:
http://storm0:8088/proxy/application_1457002628102_0043/
16/03/10 18:33:09 INFO SparkClientImpl: 	 user: stana
16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO Client:
Application report for application_1457002628102_0043 (state: FAILED)
16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO Client:
16/03/10 18:33:10 INFO SparkClientImpl: 	 client token: N/A
16/03/10 18:33:10 INFO SparkClientImpl: 	 diagnostics: Application
application_1457002628102_0043 failed 1 times due to AM Container for
appattempt_1457002628102_0043_000001 exited with  exitCode: -1000
16/03/10 18:33:10 INFO SparkClientImpl: For more detailed output,
check application tracking
page:http://storm0:8088/proxy/application_1457002628102_0043/Then,
click on links to logs of each attempt.
16/03/10 18:33:10 INFO SparkClientImpl: Diagnostics:
java.io.FileNotFoundException: File
file:/Users/stana/.m2/repository/org/apache/hive/hive-exec/2.0.0/hive-exec-2.0.0.jar
does not exist
16/03/10 18:33:10 INFO SparkClientImpl: Failing this attempt. Failing
the application.
16/03/10 18:33:10 INFO SparkClientImpl: 	 ApplicationMaster host: N/A
16/03/10 18:33:10 INFO SparkClientImpl: 	 ApplicationMaster RPC port: -1
16/03/10 18:33:10 INFO SparkClientImpl: 	 queue: default
16/03/10 18:33:10 INFO SparkClientImpl: 	 start time: 1457180833494
16/03/10 18:33:10 INFO SparkClientImpl: 	 final status: FAILED
16/03/10 18:33:10 INFO SparkClientImpl: 	 tracking URL:
http://storm0:8088/cluster/app/application_1457002628102_0043
16/03/10 18:33:10 INFO SparkClientImpl: 	 user: stana
16/03/10 18:33:10 INFO SparkClientImpl: Exception in thread "main"
org.apache.spark.SparkException: Application
application_1457002628102_0043 finished with failed status
16/03/10 18:33:10 INFO SparkClientImpl: 	at
org.apache.spark.deploy.yarn.Client.run(Client.scala:920)
16/03/10 18:33:10 INFO SparkClientImpl: 	at
org.apache.spark.deploy.yarn.Client$.main(Client.scala:966)
16/03/10 18:33:10 INFO SparkClientImpl: 	at
org.apache.spark.deploy.yarn.Client.main(Client.scala)
16/03/10 18:33:10 INFO SparkClientImpl: 	at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
16/03/10 18:33:10 INFO SparkClientImpl: 	at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
16/03/10 18:33:10 INFO SparkClientImpl: 	at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
16/03/10 18:33:10 INFO SparkClientImpl: 	at
java.lang.reflect.Method.invoke(Method.java:606)
16/03/10 18:33:10 INFO SparkClientImpl: 	at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
16/03/10 18:33:10 INFO SparkClientImpl: 	at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
16/03/10 18:33:10 INFO SparkClientImpl: 	at
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
16/03/10 18:33:10 INFO SparkClientImpl: 	at
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
16/03/10 18:33:10 INFO SparkClientImpl: 	at
org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO
ShutdownHookManager: Shutdown hook called
16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO
ShutdownHookManager: Deleting directory
/private/var/folders/vt/cjcdhms903x7brn1kbh558s40000gn/T/spark-5b92ce20-b6f8-4832-8b15-5e98bd0e0705
16/03/10 18:33:10 WARN SparkClientImpl: Error while waiting for client
to connect.
java.util.concurrent.ExecutionException: java.lang.RuntimeException:
Cancel client '5bda93c0-865b-48a8-b368-c2fcc30e81e8'. Error: Child
process exited before connecting back
	at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
	at org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:101)
[hive-exec-2.0.0.jar:2.0.0]
	at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80)
[hive-exec-2.0.0.jar:2.0.0]
	at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:98)
[hive-exec-2.0.0.jar:2.0.0]
	at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:94)
[hive-exec-2.0.0.jar:2.0.0]
	at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:63)
[hive-exec-2.0.0.jar:2.0.0]
	at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:55)
[hive-exec-2.0.0.jar:2.0.0]
	at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:114)
[hive-exec-2.0.0.jar:2.0.0]
	at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:131)
[hive-exec-2.0.0.jar:2.0.0]
	at org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:117)
[hive-exec-2.0.0.jar:2.0.0]
	at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
[hive-exec-2.0.0.jar:2.0.0]
	at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
[hive-exec-2.0.0.jar:2.0.0]
	at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
[hive-exec-2.0.0.jar:2.0.0]
	at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
[hive-exec-2.0.0.jar:2.0.0]
	at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
[hive-exec-2.0.0.jar:2.0.0]
	at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runJoinOptimizations(SparkCompiler.java:181)
[hive-exec-2.0.0.jar:2.0.0]
	at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:119)
[hive-exec-2.0.0.jar:2.0.0]
	at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102)
[hive-exec-2.0.0.jar:2.0.0]
	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10195)
[hive-exec-2.0.0.jar:2.0.0]
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:229)
[hive-exec-2.0.0.jar:2.0.0]
	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:239)
[hive-exec-2.0.0.jar:2.0.0]
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:479)
[hive-exec-2.0.0.jar:?]
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:319)
[hive-exec-2.0.0.jar:?]
	at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1255)
[hive-exec-2.0.0.jar:?]
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1301)
[hive-exec-2.0.0.jar:?]
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1184)
[hive-exec-2.0.0.jar:?]
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1172)
[hive-exec-2.0.0.jar:?]
	at org.apache.hadoop.hive.ql.TestHiveDriver.main(TestHiveDriver.java:41)
[test-classes/:?]
Caused by: java.lang.RuntimeException: Cancel client
'5bda93c0-865b-48a8-b368-c2fcc30e81e8'. Error: Child process exited
before connecting back
	at org.apache.hive.spark.client.rpc.RpcServer.cancelClient(RpcServer.java:179)
~[hive-exec-2.0.0.jar:2.0.0]
	at org.apache.hive.spark.client.SparkClientImpl$3.run(SparkClientImpl.java:450)
~[hive-exec-2.0.0.jar:2.0.0]
	at java.lang.Thread.run(Thread.java:745) ~[?:1.7.0_67]
16/03/10 18:33:10 WARN SparkClientImpl: Child process exited with code 1.
FAILED: SemanticException Failed to get a spark session:
org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create
spark client.
16/03/10 18:33:10 ERROR Driver: FAILED: SemanticException Failed to
get a spark session: org.apache.hadoop.hive.ql.metadata.HiveException:
Failed to create spark client.
org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get a
spark session: org.apache.hadoop.hive.ql.metadata.HiveException:
Failed to create spark client.
	at org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:121)
	at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
	at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
	at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
	at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
	at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
	at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runJoinOptimizations(SparkCompiler.java:181)
	at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:119)
	at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102)
	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10195)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:229)
	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:239)
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:479)
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:319)
	at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1255)
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1301)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1184)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1172)
	at org.apache.hadoop.hive.ql.TestHiveDriver.main(TestHiveDriver.java:41)

Re: Error in Hive on Spark

Posted by Xuefu Zhang <xu...@uber.com>.
Yes, it seems more viable that you integrate your application with HS2 via
JDBC or thrift rather than at code level.

--Xuefu

On Tue, Mar 22, 2016 at 12:01 AM, Stana <st...@is-land.com.tw> wrote:

> Hi, Xuefu
>
> You are right.
> Maybe I should launch spark-submit by HS2 or Hive CLI ?
>
> Thanks a lot,
> Stana
>
>
> 2016-03-22 1:16 GMT+08:00 Xuefu Zhang <xu...@uber.com>:
>
> > Stana,
> >
> > I'm not sure if I fully understand the problem. spark-submit is launched
> in
> > the same host as your application, which should be able to access
> > hive-exec.jar. Yarn cluster needs the jar also, but HS2 or Hive CLI will
> > take care of that. Since you are not using either of which, then, it's
> your
> > application's responsibility to make that happen.
> >
> > Did I missed anything else?
> >
> > Thanks,
> > Xuefu
> >
> > On Sun, Mar 20, 2016 at 11:18 PM, Stana <st...@is-land.com.tw> wrote:
> >
> > > Does anyone have suggestions in setting property of hive-exec-2.0.0.jar
> > > path in application?
> > > Something like
> > >
> > >
> >
> 'hiveConf.set("hive.remote.driver.jar","hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar")'.
> > >
> > >
> > >
> > > 2016-03-11 10:53 GMT+08:00 Stana <st...@is-land.com.tw>:
> > >
> > > > Thanks for reply
> > > >
> > > > I have set the property spark.home in my application. Otherwise the
> > > > application threw 'SPARK_HOME not found exception'.
> > > >
> > > > I found hive source code in SparkClientImpl.java:
> > > >
> > > > private Thread startDriver(final RpcServer rpcServer, final String
> > > > clientId, final String secret)
> > > >       throws IOException {
> > > > ...
> > > >
> > > > List<String> argv = Lists.newArrayList();
> > > >
> > > > ...
> > > >
> > > > argv.add("--class");
> > > > argv.add(RemoteDriver.class.getName());
> > > >
> > > > String jar = "spark-internal";
> > > > if (SparkContext.jarOfClass(this.getClass()).isDefined()) {
> > > > jar = SparkContext.jarOfClass(this.getClass()).get();
> > > > }
> > > > argv.add(jar);
> > > >
> > > > ...
> > > >
> > > > }
> > > >
> > > > When hive executed spark-submit , it generate the shell command with
> > > > --class org.apache.hive.spark.client.RemoteDriver ,and set jar path
> > with
> > > > SparkContext.jarOfClass(this.getClass()).get(). It will get the local
> > > path
> > > > of hive-exec-2.0.0.jar.
> > > >
> > > > In my situation, the application and yarn cluster are in different
> > > cluster.
> > > > When application executed spark-submit with local path of
> > > > hive-exec-2.0.0.jar to yarn cluster, there 's no hive-exec-2.0.0.jar
> in
> > > > yarn cluster. Then application threw the exception:
> > "hive-exec-2.0.0.jar
> > > >   does not exist ...".
> > > >
> > > > Can it be set property of hive-exec-2.0.0.jar path in application ?
> > > > Something like 'hiveConf.set("hive.remote.driver.jar",
> > > > "hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar")'.
> > > > If not, is it possible to achieve in the future version?
> > > >
> > > >
> > > >
> > > >
> > > > 2016-03-10 23:51 GMT+08:00 Xuefu Zhang <xu...@uber.com>:
> > > >
> > > >> You can probably avoid the problem by set environment variable
> > > SPARK_HOME
> > > >> or JVM property spark.home that points to your spark installation.
> > > >>
> > > >> --Xuefu
> > > >>
> > > >> On Thu, Mar 10, 2016 at 3:11 AM, Stana <st...@is-land.com.tw>
> wrote:
> > > >>
> > > >> >  I am trying out Hive on Spark with hive 2.0.0 and spark 1.4.1,
> and
> > > >> > executing org.apache.hadoop.hive.ql.Driver with java application.
> > > >> >
> > > >> > Following are my situations:
> > > >> > 1.Building spark 1.4.1 assembly jar without Hive .
> > > >> > 2.Uploading the spark assembly jar to the hadoop cluster.
> > > >> > 3.Executing the java application with eclipse IDE in my client
> > > computer.
> > > >> >
> > > >> > The application went well and it submitted mr job to the yarn
> > cluster
> > > >> > successfully when using " hiveConf.set("hive.execution.engine",
> > "mr")
> > > >> > ",but it threw exceptions in spark-engine.
> > > >> >
> > > >> > Finally, i traced Hive source code and came to the conclusion:
> > > >> >
> > > >> > In my situation, SparkClientImpl class will generate the
> > spark-submit
> > > >> > shell and executed it.
> > > >> > The shell command allocated  --class with
> > RemoteDriver.class.getName()
> > > >> > and jar with SparkContext.jarOfClass(this.getClass()).get(), so
> that
> > > >> > my application threw the exception.
> > > >> >
> > > >> > Is it right? And how can I do to execute the application with
> > > >> > spark-engine successfully in my client computer ? Thanks a lot!
> > > >> >
> > > >> >
> > > >> > Java application code:
> > > >> >
> > > >> > public class TestHiveDriver {
> > > >> >
> > > >> >         private static HiveConf hiveConf;
> > > >> >         private static Driver driver;
> > > >> >         private static CliSessionState ss;
> > > >> >         public static void main(String[] args){
> > > >> >
> > > >> >                 String sql = "select * from hadoop0263_0 as a join
> > > >> > hadoop0263_0 as b
> > > >> > on (a.key = b.key)";
> > > >> >                 ss = new CliSessionState(new
> > > >> HiveConf(SessionState.class));
> > > >> >                 hiveConf = new HiveConf(Driver.class);
> > > >> >                 hiveConf.set("fs.default.name",
> > > "hdfs://storm0:9000");
> > > >> >                 hiveConf.set("yarn.resourcemanager.address",
> > > >> > "storm0:8032");
> > > >> >
> >  hiveConf.set("yarn.resourcemanager.scheduler.address",
> > > >> > "storm0:8030");
> > > >> >
> > > >> >
> > > >>
> > >
> >
> hiveConf.set("yarn.resourcemanager.resource-tracker.address","storm0:8031");
> > > >> >                 hiveConf.set("yarn.resourcemanager.admin.address",
> > > >> > "storm0:8033");
> > > >> >                 hiveConf.set("mapreduce.framework.name", "yarn");
> > > >> >                 hiveConf.set("mapreduce.johistory.address",
> > > >> > "storm0:10020");
> > > >> >
> > > >> >
> > > >>
> > >
> >
> hiveConf.set("javax.jdo.option.ConnectionURL","jdbc:mysql://storm0:3306/stana_metastore");
> > > >> >
> > > >> >
> > > >>
> > >
> >
> hiveConf.set("javax.jdo.option.ConnectionDriverName","com.mysql.jdbc.Driver");
> > > >> >
>  hiveConf.set("javax.jdo.option.ConnectionUserName",
> > > >> > "root");
> > > >> >
>  hiveConf.set("javax.jdo.option.ConnectionPassword",
> > > >> > "123456");
> > > >> >
>  hiveConf.setBoolean("hive.auto.convert.join",false);
> > > >> >                 hiveConf.set("spark.yarn.jar",
> > > >> > "hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar");
> > > >> >                 hiveConf.set("spark.home","target/spark");
> > > >> >                 hiveConf.set("hive.execution.engine", "spark");
> > > >> >                 hiveConf.set("hive.dbname", "default");
> > > >> >
> > > >> >
> > > >> >                 driver = new Driver(hiveConf);
> > > >> >                 SessionState.start(hiveConf);
> > > >> >
> > > >> >                 CommandProcessorResponse res = null;
> > > >> >                 try {
> > > >> >                         res = driver.run(sql);
> > > >> >                 } catch (CommandNeedRetryException e) {
> > > >> >                         // TODO Auto-generated catch block
> > > >> >                         e.printStackTrace();
> > > >> >                 }
> > > >> >
> > > >> >                 System.out.println("Response Code:" +
> > > >> > res.getResponseCode());
> > > >> >                 System.out.println("Error Message:" +
> > > >> > res.getErrorMessage());
> > > >> >                 System.out.println("SQL State:" +
> > res.getSQLState());
> > > >> >
> > > >> >         }
> > > >> > }
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> > Exception of spark-engine:
> > > >> >
> > > >> > 16/03/10 18:32:58 INFO SparkClientImpl: Running client driver with
> > > >> > argv:
> > > >> >
> > > >>
> > >
> >
> /Volumes/Sdhd/Documents/project/island/java/apache/hive-200-test/hive-release-2.0.0/itests/hive-unit/target/spark/bin/spark-submit
> > > >> > --properties-file
> > > >> >
> > > >> >
> > > >>
> > >
> >
> /var/folders/vt/cjcdhms903x7brn1kbh558s40000gn/T/spark-submit.7697089826296920539.properties
> > > >> > --class org.apache.hive.spark.client.RemoteDriver
> > > >> >
> > > >> >
> > > >>
> > >
> >
> /Users/stana/.m2/repository/org/apache/hive/hive-exec/2.0.0/hive-exec-2.0.0.jar
> > > >> > --remote-host MacBook-Pro.local --remote-port 51331 --conf
> > > >> > hive.spark.client.connect.timeout=1000 --conf
> > > >> > hive.spark.client.server.connect.timeout=90000 --conf
> > > >> > hive.spark.client.channel.log.level=null --conf
> > > >> > hive.spark.client.rpc.max.size=52428800 --conf
> > > >> > hive.spark.client.rpc.threads=8 --conf
> > > >> > hive.spark.client.secret.bits=256
> > > >> > 16/03/10 18:33:09 INFO SparkClientImpl: 16/03/10 18:33:09 INFO
> > Client:
> > > >> > 16/03/10 18:33:09 INFO SparkClientImpl:          client token: N/A
> > > >> > 16/03/10 18:33:09 INFO SparkClientImpl:          diagnostics: N/A
> > > >> > 16/03/10 18:33:09 INFO SparkClientImpl:          ApplicationMaster
> > > host:
> > > >> > N/A
> > > >> > 16/03/10 18:33:09 INFO SparkClientImpl:          ApplicationMaster
> > RPC
> > > >> > port: -1
> > > >> > 16/03/10 18:33:09 INFO SparkClientImpl:          queue: default
> > > >> > 16/03/10 18:33:09 INFO SparkClientImpl:          start time:
> > > >> 1457180833494
> > > >> > 16/03/10 18:33:09 INFO SparkClientImpl:          final status:
> > > UNDEFINED
> > > >> > 16/03/10 18:33:09 INFO SparkClientImpl:          tracking URL:
> > > >> > http://storm0:8088/proxy/application_1457002628102_0043/
> > > >> > 16/03/10 18:33:09 INFO SparkClientImpl:          user: stana
> > > >> > 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO
> > Client:
> > > >> > Application report for application_1457002628102_0043 (state:
> > FAILED)
> > > >> > 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO
> > Client:
> > > >> > 16/03/10 18:33:10 INFO SparkClientImpl:          client token: N/A
> > > >> > 16/03/10 18:33:10 INFO SparkClientImpl:          diagnostics:
> > > >> Application
> > > >> > application_1457002628102_0043 failed 1 times due to AM Container
> > for
> > > >> > appattempt_1457002628102_0043_000001 exited with  exitCode: -1000
> > > >> > 16/03/10 18:33:10 INFO SparkClientImpl: For more detailed output,
> > > >> > check application tracking
> > > >> > page:http://storm0:8088/proxy/application_1457002628102_0043/Then
> ,
> > > >> > click on links to logs of each attempt.
> > > >> > 16/03/10 18:33:10 INFO SparkClientImpl: Diagnostics:
> > > >> > java.io.FileNotFoundException: File
> > > >> >
> > > >> >
> > > >>
> > >
> >
> file:/Users/stana/.m2/repository/org/apache/hive/hive-exec/2.0.0/hive-exec-2.0.0.jar
> > > >> > does not exist
> > > >> > 16/03/10 18:33:10 INFO SparkClientImpl: Failing this attempt.
> > Failing
> > > >> > the application.
> > > >> > 16/03/10 18:33:10 INFO SparkClientImpl:          ApplicationMaster
> > > host:
> > > >> > N/A
> > > >> > 16/03/10 18:33:10 INFO SparkClientImpl:          ApplicationMaster
> > RPC
> > > >> > port: -1
> > > >> > 16/03/10 18:33:10 INFO SparkClientImpl:          queue: default
> > > >> > 16/03/10 18:33:10 INFO SparkClientImpl:          start time:
> > > >> 1457180833494
> > > >> > 16/03/10 18:33:10 INFO SparkClientImpl:          final status:
> > FAILED
> > > >> > 16/03/10 18:33:10 INFO SparkClientImpl:          tracking URL:
> > > >> > http://storm0:8088/cluster/app/application_1457002628102_0043
> > > >> > 16/03/10 18:33:10 INFO SparkClientImpl:          user: stana
> > > >> > 16/03/10 18:33:10 INFO SparkClientImpl: Exception in thread "main"
> > > >> > org.apache.spark.SparkException: Application
> > > >> > application_1457002628102_0043 finished with failed status
> > > >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > > >> > org.apache.spark.deploy.yarn.Client.run(Client.scala:920)
> > > >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > > >> > org.apache.spark.deploy.yarn.Client$.main(Client.scala:966)
> > > >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > > >> > org.apache.spark.deploy.yarn.Client.main(Client.scala)
> > > >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > > >> > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > > >> >
> > > >> >
> > > >>
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > > >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > > >> >
> > > >> >
> > > >>
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > > >> > java.lang.reflect.Method.invoke(Method.java:606)
> > > >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > > >> >
> > > >> >
> > > >>
> > >
> >
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
> > > >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > > >> >
> > > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
> > > >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > > >> > org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
> > > >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > > >> > org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
> > > >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > > >> > org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> > > >> > 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO
> > > >> > ShutdownHookManager: Shutdown hook called
> > > >> > 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO
> > > >> > ShutdownHookManager: Deleting directory
> > > >> >
> > > >> >
> > > >>
> > >
> >
> /private/var/folders/vt/cjcdhms903x7brn1kbh558s40000gn/T/spark-5b92ce20-b6f8-4832-8b15-5e98bd0e0705
> > > >> > 16/03/10 18:33:10 WARN SparkClientImpl: Error while waiting for
> > client
> > > >> > to connect.
> > > >> > java.util.concurrent.ExecutionException:
> java.lang.RuntimeException:
> > > >> > Cancel client '5bda93c0-865b-48a8-b368-c2fcc30e81e8'. Error: Child
> > > >> > process exited before connecting back
> > > >> >         at
> > > >> >
> io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
> > > >> > ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:101)
> > > >> > [hive-exec-2.0.0.jar:2.0.0]
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80)
> > > >> > [hive-exec-2.0.0.jar:2.0.0]
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:98)
> > > >> > [hive-exec-2.0.0.jar:2.0.0]
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:94)
> > > >> > [hive-exec-2.0.0.jar:2.0.0]
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:63)
> > > >> > [hive-exec-2.0.0.jar:2.0.0]
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:55)
> > > >> > [hive-exec-2.0.0.jar:2.0.0]
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:114)
> > > >> > [hive-exec-2.0.0.jar:2.0.0]
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:131)
> > > >> > [hive-exec-2.0.0.jar:2.0.0]
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:117)
> > > >> > [hive-exec-2.0.0.jar:2.0.0]
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> > > >> > [hive-exec-2.0.0.jar:2.0.0]
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> > > >> > [hive-exec-2.0.0.jar:2.0.0]
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
> > > >> > [hive-exec-2.0.0.jar:2.0.0]
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
> > > >> > [hive-exec-2.0.0.jar:2.0.0]
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
> > > >> > [hive-exec-2.0.0.jar:2.0.0]
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runJoinOptimizations(SparkCompiler.java:181)
> > > >> > [hive-exec-2.0.0.jar:2.0.0]
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:119)
> > > >> > [hive-exec-2.0.0.jar:2.0.0]
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102)
> > > >> > [hive-exec-2.0.0.jar:2.0.0]
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10195)
> > > >> > [hive-exec-2.0.0.jar:2.0.0]
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:229)
> > > >> > [hive-exec-2.0.0.jar:2.0.0]
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:239)
> > > >> > [hive-exec-2.0.0.jar:2.0.0]
> > > >> >         at
> org.apache.hadoop.hive.ql.Driver.compile(Driver.java:479)
> > > >> > [hive-exec-2.0.0.jar:?]
> > > >> >         at
> org.apache.hadoop.hive.ql.Driver.compile(Driver.java:319)
> > > >> > [hive-exec-2.0.0.jar:?]
> > > >> >         at
> > > >> > org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1255)
> > > >> > [hive-exec-2.0.0.jar:?]
> > > >> >         at
> > > >> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1301)
> > > >> > [hive-exec-2.0.0.jar:?]
> > > >> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1184)
> > > >> > [hive-exec-2.0.0.jar:?]
> > > >> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1172)
> > > >> > [hive-exec-2.0.0.jar:?]
> > > >> >         at
> > > >> >
> > org.apache.hadoop.hive.ql.TestHiveDriver.main(TestHiveDriver.java:41)
> > > >> > [test-classes/:?]
> > > >> > Caused by: java.lang.RuntimeException: Cancel client
> > > >> > '5bda93c0-865b-48a8-b368-c2fcc30e81e8'. Error: Child process
> exited
> > > >> > before connecting back
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hive.spark.client.rpc.RpcServer.cancelClient(RpcServer.java:179)
> > > >> > ~[hive-exec-2.0.0.jar:2.0.0]
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hive.spark.client.SparkClientImpl$3.run(SparkClientImpl.java:450)
> > > >> > ~[hive-exec-2.0.0.jar:2.0.0]
> > > >> >         at java.lang.Thread.run(Thread.java:745) ~[?:1.7.0_67]
> > > >> > 16/03/10 18:33:10 WARN SparkClientImpl: Child process exited with
> > code
> > > >> 1.
> > > >> > FAILED: SemanticException Failed to get a spark session:
> > > >> > org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create
> > > >> > spark client.
> > > >> > 16/03/10 18:33:10 ERROR Driver: FAILED: SemanticException Failed
> to
> > > >> > get a spark session:
> > org.apache.hadoop.hive.ql.metadata.HiveException:
> > > >> > Failed to create spark client.
> > > >> > org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get a
> > > >> > spark session: org.apache.hadoop.hive.ql.metadata.HiveException:
> > > >> > Failed to create spark client.
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:121)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runJoinOptimizations(SparkCompiler.java:181)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:119)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10195)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:229)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:239)
> > > >> >         at
> org.apache.hadoop.hive.ql.Driver.compile(Driver.java:479)
> > > >> >         at
> org.apache.hadoop.hive.ql.Driver.compile(Driver.java:319)
> > > >> >         at
> > > >> > org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1255)
> > > >> >         at
> > > >> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1301)
> > > >> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1184)
> > > >> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1172)
> > > >> >         at
> > > >> >
> > org.apache.hadoop.hive.ql.TestHiveDriver.main(TestHiveDriver.java:41)
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: Error in Hive on Spark

Posted by Stana <st...@is-land.com.tw>.
Hi, Xuefu

You are right.
Maybe I should launch spark-submit by HS2 or Hive CLI ?

Thanks a lot,
Stana


2016-03-22 1:16 GMT+08:00 Xuefu Zhang <xu...@uber.com>:

> Stana,
>
> I'm not sure if I fully understand the problem. spark-submit is launched in
> the same host as your application, which should be able to access
> hive-exec.jar. Yarn cluster needs the jar also, but HS2 or Hive CLI will
> take care of that. Since you are not using either of which, then, it's your
> application's responsibility to make that happen.
>
> Did I missed anything else?
>
> Thanks,
> Xuefu
>
> On Sun, Mar 20, 2016 at 11:18 PM, Stana <st...@is-land.com.tw> wrote:
>
> > Does anyone have suggestions in setting property of hive-exec-2.0.0.jar
> > path in application?
> > Something like
> >
> >
> 'hiveConf.set("hive.remote.driver.jar","hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar")'.
> >
> >
> >
> > 2016-03-11 10:53 GMT+08:00 Stana <st...@is-land.com.tw>:
> >
> > > Thanks for reply
> > >
> > > I have set the property spark.home in my application. Otherwise the
> > > application threw 'SPARK_HOME not found exception'.
> > >
> > > I found hive source code in SparkClientImpl.java:
> > >
> > > private Thread startDriver(final RpcServer rpcServer, final String
> > > clientId, final String secret)
> > >       throws IOException {
> > > ...
> > >
> > > List<String> argv = Lists.newArrayList();
> > >
> > > ...
> > >
> > > argv.add("--class");
> > > argv.add(RemoteDriver.class.getName());
> > >
> > > String jar = "spark-internal";
> > > if (SparkContext.jarOfClass(this.getClass()).isDefined()) {
> > > jar = SparkContext.jarOfClass(this.getClass()).get();
> > > }
> > > argv.add(jar);
> > >
> > > ...
> > >
> > > }
> > >
> > > When hive executed spark-submit , it generate the shell command with
> > > --class org.apache.hive.spark.client.RemoteDriver ,and set jar path
> with
> > > SparkContext.jarOfClass(this.getClass()).get(). It will get the local
> > path
> > > of hive-exec-2.0.0.jar.
> > >
> > > In my situation, the application and yarn cluster are in different
> > cluster.
> > > When application executed spark-submit with local path of
> > > hive-exec-2.0.0.jar to yarn cluster, there 's no hive-exec-2.0.0.jar in
> > > yarn cluster. Then application threw the exception:
> "hive-exec-2.0.0.jar
> > >   does not exist ...".
> > >
> > > Can it be set property of hive-exec-2.0.0.jar path in application ?
> > > Something like 'hiveConf.set("hive.remote.driver.jar",
> > > "hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar")'.
> > > If not, is it possible to achieve in the future version?
> > >
> > >
> > >
> > >
> > > 2016-03-10 23:51 GMT+08:00 Xuefu Zhang <xu...@uber.com>:
> > >
> > >> You can probably avoid the problem by set environment variable
> > SPARK_HOME
> > >> or JVM property spark.home that points to your spark installation.
> > >>
> > >> --Xuefu
> > >>
> > >> On Thu, Mar 10, 2016 at 3:11 AM, Stana <st...@is-land.com.tw> wrote:
> > >>
> > >> >  I am trying out Hive on Spark with hive 2.0.0 and spark 1.4.1, and
> > >> > executing org.apache.hadoop.hive.ql.Driver with java application.
> > >> >
> > >> > Following are my situations:
> > >> > 1.Building spark 1.4.1 assembly jar without Hive .
> > >> > 2.Uploading the spark assembly jar to the hadoop cluster.
> > >> > 3.Executing the java application with eclipse IDE in my client
> > computer.
> > >> >
> > >> > The application went well and it submitted mr job to the yarn
> cluster
> > >> > successfully when using " hiveConf.set("hive.execution.engine",
> "mr")
> > >> > ",but it threw exceptions in spark-engine.
> > >> >
> > >> > Finally, i traced Hive source code and came to the conclusion:
> > >> >
> > >> > In my situation, SparkClientImpl class will generate the
> spark-submit
> > >> > shell and executed it.
> > >> > The shell command allocated  --class with
> RemoteDriver.class.getName()
> > >> > and jar with SparkContext.jarOfClass(this.getClass()).get(), so that
> > >> > my application threw the exception.
> > >> >
> > >> > Is it right? And how can I do to execute the application with
> > >> > spark-engine successfully in my client computer ? Thanks a lot!
> > >> >
> > >> >
> > >> > Java application code:
> > >> >
> > >> > public class TestHiveDriver {
> > >> >
> > >> >         private static HiveConf hiveConf;
> > >> >         private static Driver driver;
> > >> >         private static CliSessionState ss;
> > >> >         public static void main(String[] args){
> > >> >
> > >> >                 String sql = "select * from hadoop0263_0 as a join
> > >> > hadoop0263_0 as b
> > >> > on (a.key = b.key)";
> > >> >                 ss = new CliSessionState(new
> > >> HiveConf(SessionState.class));
> > >> >                 hiveConf = new HiveConf(Driver.class);
> > >> >                 hiveConf.set("fs.default.name",
> > "hdfs://storm0:9000");
> > >> >                 hiveConf.set("yarn.resourcemanager.address",
> > >> > "storm0:8032");
> > >> >
>  hiveConf.set("yarn.resourcemanager.scheduler.address",
> > >> > "storm0:8030");
> > >> >
> > >> >
> > >>
> >
> hiveConf.set("yarn.resourcemanager.resource-tracker.address","storm0:8031");
> > >> >                 hiveConf.set("yarn.resourcemanager.admin.address",
> > >> > "storm0:8033");
> > >> >                 hiveConf.set("mapreduce.framework.name", "yarn");
> > >> >                 hiveConf.set("mapreduce.johistory.address",
> > >> > "storm0:10020");
> > >> >
> > >> >
> > >>
> >
> hiveConf.set("javax.jdo.option.ConnectionURL","jdbc:mysql://storm0:3306/stana_metastore");
> > >> >
> > >> >
> > >>
> >
> hiveConf.set("javax.jdo.option.ConnectionDriverName","com.mysql.jdbc.Driver");
> > >> >                 hiveConf.set("javax.jdo.option.ConnectionUserName",
> > >> > "root");
> > >> >                 hiveConf.set("javax.jdo.option.ConnectionPassword",
> > >> > "123456");
> > >> >                 hiveConf.setBoolean("hive.auto.convert.join",false);
> > >> >                 hiveConf.set("spark.yarn.jar",
> > >> > "hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar");
> > >> >                 hiveConf.set("spark.home","target/spark");
> > >> >                 hiveConf.set("hive.execution.engine", "spark");
> > >> >                 hiveConf.set("hive.dbname", "default");
> > >> >
> > >> >
> > >> >                 driver = new Driver(hiveConf);
> > >> >                 SessionState.start(hiveConf);
> > >> >
> > >> >                 CommandProcessorResponse res = null;
> > >> >                 try {
> > >> >                         res = driver.run(sql);
> > >> >                 } catch (CommandNeedRetryException e) {
> > >> >                         // TODO Auto-generated catch block
> > >> >                         e.printStackTrace();
> > >> >                 }
> > >> >
> > >> >                 System.out.println("Response Code:" +
> > >> > res.getResponseCode());
> > >> >                 System.out.println("Error Message:" +
> > >> > res.getErrorMessage());
> > >> >                 System.out.println("SQL State:" +
> res.getSQLState());
> > >> >
> > >> >         }
> > >> > }
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > Exception of spark-engine:
> > >> >
> > >> > 16/03/10 18:32:58 INFO SparkClientImpl: Running client driver with
> > >> > argv:
> > >> >
> > >>
> >
> /Volumes/Sdhd/Documents/project/island/java/apache/hive-200-test/hive-release-2.0.0/itests/hive-unit/target/spark/bin/spark-submit
> > >> > --properties-file
> > >> >
> > >> >
> > >>
> >
> /var/folders/vt/cjcdhms903x7brn1kbh558s40000gn/T/spark-submit.7697089826296920539.properties
> > >> > --class org.apache.hive.spark.client.RemoteDriver
> > >> >
> > >> >
> > >>
> >
> /Users/stana/.m2/repository/org/apache/hive/hive-exec/2.0.0/hive-exec-2.0.0.jar
> > >> > --remote-host MacBook-Pro.local --remote-port 51331 --conf
> > >> > hive.spark.client.connect.timeout=1000 --conf
> > >> > hive.spark.client.server.connect.timeout=90000 --conf
> > >> > hive.spark.client.channel.log.level=null --conf
> > >> > hive.spark.client.rpc.max.size=52428800 --conf
> > >> > hive.spark.client.rpc.threads=8 --conf
> > >> > hive.spark.client.secret.bits=256
> > >> > 16/03/10 18:33:09 INFO SparkClientImpl: 16/03/10 18:33:09 INFO
> Client:
> > >> > 16/03/10 18:33:09 INFO SparkClientImpl:          client token: N/A
> > >> > 16/03/10 18:33:09 INFO SparkClientImpl:          diagnostics: N/A
> > >> > 16/03/10 18:33:09 INFO SparkClientImpl:          ApplicationMaster
> > host:
> > >> > N/A
> > >> > 16/03/10 18:33:09 INFO SparkClientImpl:          ApplicationMaster
> RPC
> > >> > port: -1
> > >> > 16/03/10 18:33:09 INFO SparkClientImpl:          queue: default
> > >> > 16/03/10 18:33:09 INFO SparkClientImpl:          start time:
> > >> 1457180833494
> > >> > 16/03/10 18:33:09 INFO SparkClientImpl:          final status:
> > UNDEFINED
> > >> > 16/03/10 18:33:09 INFO SparkClientImpl:          tracking URL:
> > >> > http://storm0:8088/proxy/application_1457002628102_0043/
> > >> > 16/03/10 18:33:09 INFO SparkClientImpl:          user: stana
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO
> Client:
> > >> > Application report for application_1457002628102_0043 (state:
> FAILED)
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO
> Client:
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:          client token: N/A
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:          diagnostics:
> > >> Application
> > >> > application_1457002628102_0043 failed 1 times due to AM Container
> for
> > >> > appattempt_1457002628102_0043_000001 exited with  exitCode: -1000
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl: For more detailed output,
> > >> > check application tracking
> > >> > page:http://storm0:8088/proxy/application_1457002628102_0043/Then,
> > >> > click on links to logs of each attempt.
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl: Diagnostics:
> > >> > java.io.FileNotFoundException: File
> > >> >
> > >> >
> > >>
> >
> file:/Users/stana/.m2/repository/org/apache/hive/hive-exec/2.0.0/hive-exec-2.0.0.jar
> > >> > does not exist
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl: Failing this attempt.
> Failing
> > >> > the application.
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:          ApplicationMaster
> > host:
> > >> > N/A
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:          ApplicationMaster
> RPC
> > >> > port: -1
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:          queue: default
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:          start time:
> > >> 1457180833494
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:          final status:
> FAILED
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:          tracking URL:
> > >> > http://storm0:8088/cluster/app/application_1457002628102_0043
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:          user: stana
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl: Exception in thread "main"
> > >> > org.apache.spark.SparkException: Application
> > >> > application_1457002628102_0043 finished with failed status
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > >> > org.apache.spark.deploy.yarn.Client.run(Client.scala:920)
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > >> > org.apache.spark.deploy.yarn.Client$.main(Client.scala:966)
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > >> > org.apache.spark.deploy.yarn.Client.main(Client.scala)
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > >> > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > >> >
> > >> >
> > >>
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > >> >
> > >> >
> > >>
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > >> > java.lang.reflect.Method.invoke(Method.java:606)
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > >> >
> > >> >
> > >>
> >
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > >> >
> > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > >> > org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > >> > org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > >> > org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO
> > >> > ShutdownHookManager: Shutdown hook called
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO
> > >> > ShutdownHookManager: Deleting directory
> > >> >
> > >> >
> > >>
> >
> /private/var/folders/vt/cjcdhms903x7brn1kbh558s40000gn/T/spark-5b92ce20-b6f8-4832-8b15-5e98bd0e0705
> > >> > 16/03/10 18:33:10 WARN SparkClientImpl: Error while waiting for
> client
> > >> > to connect.
> > >> > java.util.concurrent.ExecutionException: java.lang.RuntimeException:
> > >> > Cancel client '5bda93c0-865b-48a8-b368-c2fcc30e81e8'. Error: Child
> > >> > process exited before connecting back
> > >> >         at
> > >> > io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
> > >> > ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:101)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:98)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:94)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:63)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:55)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:114)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:131)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:117)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runJoinOptimizations(SparkCompiler.java:181)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:119)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10195)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:229)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:239)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:479)
> > >> > [hive-exec-2.0.0.jar:?]
> > >> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:319)
> > >> > [hive-exec-2.0.0.jar:?]
> > >> >         at
> > >> > org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1255)
> > >> > [hive-exec-2.0.0.jar:?]
> > >> >         at
> > >> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1301)
> > >> > [hive-exec-2.0.0.jar:?]
> > >> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1184)
> > >> > [hive-exec-2.0.0.jar:?]
> > >> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1172)
> > >> > [hive-exec-2.0.0.jar:?]
> > >> >         at
> > >> >
> org.apache.hadoop.hive.ql.TestHiveDriver.main(TestHiveDriver.java:41)
> > >> > [test-classes/:?]
> > >> > Caused by: java.lang.RuntimeException: Cancel client
> > >> > '5bda93c0-865b-48a8-b368-c2fcc30e81e8'. Error: Child process exited
> > >> > before connecting back
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hive.spark.client.rpc.RpcServer.cancelClient(RpcServer.java:179)
> > >> > ~[hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hive.spark.client.SparkClientImpl$3.run(SparkClientImpl.java:450)
> > >> > ~[hive-exec-2.0.0.jar:2.0.0]
> > >> >         at java.lang.Thread.run(Thread.java:745) ~[?:1.7.0_67]
> > >> > 16/03/10 18:33:10 WARN SparkClientImpl: Child process exited with
> code
> > >> 1.
> > >> > FAILED: SemanticException Failed to get a spark session:
> > >> > org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create
> > >> > spark client.
> > >> > 16/03/10 18:33:10 ERROR Driver: FAILED: SemanticException Failed to
> > >> > get a spark session:
> org.apache.hadoop.hive.ql.metadata.HiveException:
> > >> > Failed to create spark client.
> > >> > org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get a
> > >> > spark session: org.apache.hadoop.hive.ql.metadata.HiveException:
> > >> > Failed to create spark client.
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:121)
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runJoinOptimizations(SparkCompiler.java:181)
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:119)
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102)
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10195)
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:229)
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:239)
> > >> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:479)
> > >> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:319)
> > >> >         at
> > >> > org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1255)
> > >> >         at
> > >> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1301)
> > >> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1184)
> > >> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1172)
> > >> >         at
> > >> >
> org.apache.hadoop.hive.ql.TestHiveDriver.main(TestHiveDriver.java:41)
> > >> >
> > >>
> > >
> > >
> >
>

Re: Error in Hive on Spark

Posted by Stana <st...@is-land.com.tw>.
Hi, Xuefu

You are right.
Maybe I should launch spark-submit by HS2 or Hive CLI ?

Thanks a lot,
Stana


2016-03-22 1:16 GMT+08:00 Xuefu Zhang <xu...@uber.com>:

> Stana,
>
> I'm not sure if I fully understand the problem. spark-submit is launched in
> the same host as your application, which should be able to access
> hive-exec.jar. Yarn cluster needs the jar also, but HS2 or Hive CLI will
> take care of that. Since you are not using either of which, then, it's your
> application's responsibility to make that happen.
>
> Did I missed anything else?
>
> Thanks,
> Xuefu
>
> On Sun, Mar 20, 2016 at 11:18 PM, Stana <st...@is-land.com.tw> wrote:
>
> > Does anyone have suggestions in setting property of hive-exec-2.0.0.jar
> > path in application?
> > Something like
> >
> >
> 'hiveConf.set("hive.remote.driver.jar","hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar")'.
> >
> >
> >
> > 2016-03-11 10:53 GMT+08:00 Stana <st...@is-land.com.tw>:
> >
> > > Thanks for reply
> > >
> > > I have set the property spark.home in my application. Otherwise the
> > > application threw 'SPARK_HOME not found exception'.
> > >
> > > I found hive source code in SparkClientImpl.java:
> > >
> > > private Thread startDriver(final RpcServer rpcServer, final String
> > > clientId, final String secret)
> > >       throws IOException {
> > > ...
> > >
> > > List<String> argv = Lists.newArrayList();
> > >
> > > ...
> > >
> > > argv.add("--class");
> > > argv.add(RemoteDriver.class.getName());
> > >
> > > String jar = "spark-internal";
> > > if (SparkContext.jarOfClass(this.getClass()).isDefined()) {
> > > jar = SparkContext.jarOfClass(this.getClass()).get();
> > > }
> > > argv.add(jar);
> > >
> > > ...
> > >
> > > }
> > >
> > > When hive executed spark-submit , it generate the shell command with
> > > --class org.apache.hive.spark.client.RemoteDriver ,and set jar path
> with
> > > SparkContext.jarOfClass(this.getClass()).get(). It will get the local
> > path
> > > of hive-exec-2.0.0.jar.
> > >
> > > In my situation, the application and yarn cluster are in different
> > cluster.
> > > When application executed spark-submit with local path of
> > > hive-exec-2.0.0.jar to yarn cluster, there 's no hive-exec-2.0.0.jar in
> > > yarn cluster. Then application threw the exception:
> "hive-exec-2.0.0.jar
> > >   does not exist ...".
> > >
> > > Can it be set property of hive-exec-2.0.0.jar path in application ?
> > > Something like 'hiveConf.set("hive.remote.driver.jar",
> > > "hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar")'.
> > > If not, is it possible to achieve in the future version?
> > >
> > >
> > >
> > >
> > > 2016-03-10 23:51 GMT+08:00 Xuefu Zhang <xu...@uber.com>:
> > >
> > >> You can probably avoid the problem by set environment variable
> > SPARK_HOME
> > >> or JVM property spark.home that points to your spark installation.
> > >>
> > >> --Xuefu
> > >>
> > >> On Thu, Mar 10, 2016 at 3:11 AM, Stana <st...@is-land.com.tw> wrote:
> > >>
> > >> >  I am trying out Hive on Spark with hive 2.0.0 and spark 1.4.1, and
> > >> > executing org.apache.hadoop.hive.ql.Driver with java application.
> > >> >
> > >> > Following are my situations:
> > >> > 1.Building spark 1.4.1 assembly jar without Hive .
> > >> > 2.Uploading the spark assembly jar to the hadoop cluster.
> > >> > 3.Executing the java application with eclipse IDE in my client
> > computer.
> > >> >
> > >> > The application went well and it submitted mr job to the yarn
> cluster
> > >> > successfully when using " hiveConf.set("hive.execution.engine",
> "mr")
> > >> > ",but it threw exceptions in spark-engine.
> > >> >
> > >> > Finally, i traced Hive source code and came to the conclusion:
> > >> >
> > >> > In my situation, SparkClientImpl class will generate the
> spark-submit
> > >> > shell and executed it.
> > >> > The shell command allocated  --class with
> RemoteDriver.class.getName()
> > >> > and jar with SparkContext.jarOfClass(this.getClass()).get(), so that
> > >> > my application threw the exception.
> > >> >
> > >> > Is it right? And how can I do to execute the application with
> > >> > spark-engine successfully in my client computer ? Thanks a lot!
> > >> >
> > >> >
> > >> > Java application code:
> > >> >
> > >> > public class TestHiveDriver {
> > >> >
> > >> >         private static HiveConf hiveConf;
> > >> >         private static Driver driver;
> > >> >         private static CliSessionState ss;
> > >> >         public static void main(String[] args){
> > >> >
> > >> >                 String sql = "select * from hadoop0263_0 as a join
> > >> > hadoop0263_0 as b
> > >> > on (a.key = b.key)";
> > >> >                 ss = new CliSessionState(new
> > >> HiveConf(SessionState.class));
> > >> >                 hiveConf = new HiveConf(Driver.class);
> > >> >                 hiveConf.set("fs.default.name",
> > "hdfs://storm0:9000");
> > >> >                 hiveConf.set("yarn.resourcemanager.address",
> > >> > "storm0:8032");
> > >> >
>  hiveConf.set("yarn.resourcemanager.scheduler.address",
> > >> > "storm0:8030");
> > >> >
> > >> >
> > >>
> >
> hiveConf.set("yarn.resourcemanager.resource-tracker.address","storm0:8031");
> > >> >                 hiveConf.set("yarn.resourcemanager.admin.address",
> > >> > "storm0:8033");
> > >> >                 hiveConf.set("mapreduce.framework.name", "yarn");
> > >> >                 hiveConf.set("mapreduce.johistory.address",
> > >> > "storm0:10020");
> > >> >
> > >> >
> > >>
> >
> hiveConf.set("javax.jdo.option.ConnectionURL","jdbc:mysql://storm0:3306/stana_metastore");
> > >> >
> > >> >
> > >>
> >
> hiveConf.set("javax.jdo.option.ConnectionDriverName","com.mysql.jdbc.Driver");
> > >> >                 hiveConf.set("javax.jdo.option.ConnectionUserName",
> > >> > "root");
> > >> >                 hiveConf.set("javax.jdo.option.ConnectionPassword",
> > >> > "123456");
> > >> >                 hiveConf.setBoolean("hive.auto.convert.join",false);
> > >> >                 hiveConf.set("spark.yarn.jar",
> > >> > "hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar");
> > >> >                 hiveConf.set("spark.home","target/spark");
> > >> >                 hiveConf.set("hive.execution.engine", "spark");
> > >> >                 hiveConf.set("hive.dbname", "default");
> > >> >
> > >> >
> > >> >                 driver = new Driver(hiveConf);
> > >> >                 SessionState.start(hiveConf);
> > >> >
> > >> >                 CommandProcessorResponse res = null;
> > >> >                 try {
> > >> >                         res = driver.run(sql);
> > >> >                 } catch (CommandNeedRetryException e) {
> > >> >                         // TODO Auto-generated catch block
> > >> >                         e.printStackTrace();
> > >> >                 }
> > >> >
> > >> >                 System.out.println("Response Code:" +
> > >> > res.getResponseCode());
> > >> >                 System.out.println("Error Message:" +
> > >> > res.getErrorMessage());
> > >> >                 System.out.println("SQL State:" +
> res.getSQLState());
> > >> >
> > >> >         }
> > >> > }
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > Exception of spark-engine:
> > >> >
> > >> > 16/03/10 18:32:58 INFO SparkClientImpl: Running client driver with
> > >> > argv:
> > >> >
> > >>
> >
> /Volumes/Sdhd/Documents/project/island/java/apache/hive-200-test/hive-release-2.0.0/itests/hive-unit/target/spark/bin/spark-submit
> > >> > --properties-file
> > >> >
> > >> >
> > >>
> >
> /var/folders/vt/cjcdhms903x7brn1kbh558s40000gn/T/spark-submit.7697089826296920539.properties
> > >> > --class org.apache.hive.spark.client.RemoteDriver
> > >> >
> > >> >
> > >>
> >
> /Users/stana/.m2/repository/org/apache/hive/hive-exec/2.0.0/hive-exec-2.0.0.jar
> > >> > --remote-host MacBook-Pro.local --remote-port 51331 --conf
> > >> > hive.spark.client.connect.timeout=1000 --conf
> > >> > hive.spark.client.server.connect.timeout=90000 --conf
> > >> > hive.spark.client.channel.log.level=null --conf
> > >> > hive.spark.client.rpc.max.size=52428800 --conf
> > >> > hive.spark.client.rpc.threads=8 --conf
> > >> > hive.spark.client.secret.bits=256
> > >> > 16/03/10 18:33:09 INFO SparkClientImpl: 16/03/10 18:33:09 INFO
> Client:
> > >> > 16/03/10 18:33:09 INFO SparkClientImpl:          client token: N/A
> > >> > 16/03/10 18:33:09 INFO SparkClientImpl:          diagnostics: N/A
> > >> > 16/03/10 18:33:09 INFO SparkClientImpl:          ApplicationMaster
> > host:
> > >> > N/A
> > >> > 16/03/10 18:33:09 INFO SparkClientImpl:          ApplicationMaster
> RPC
> > >> > port: -1
> > >> > 16/03/10 18:33:09 INFO SparkClientImpl:          queue: default
> > >> > 16/03/10 18:33:09 INFO SparkClientImpl:          start time:
> > >> 1457180833494
> > >> > 16/03/10 18:33:09 INFO SparkClientImpl:          final status:
> > UNDEFINED
> > >> > 16/03/10 18:33:09 INFO SparkClientImpl:          tracking URL:
> > >> > http://storm0:8088/proxy/application_1457002628102_0043/
> > >> > 16/03/10 18:33:09 INFO SparkClientImpl:          user: stana
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO
> Client:
> > >> > Application report for application_1457002628102_0043 (state:
> FAILED)
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO
> Client:
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:          client token: N/A
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:          diagnostics:
> > >> Application
> > >> > application_1457002628102_0043 failed 1 times due to AM Container
> for
> > >> > appattempt_1457002628102_0043_000001 exited with  exitCode: -1000
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl: For more detailed output,
> > >> > check application tracking
> > >> > page:http://storm0:8088/proxy/application_1457002628102_0043/Then,
> > >> > click on links to logs of each attempt.
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl: Diagnostics:
> > >> > java.io.FileNotFoundException: File
> > >> >
> > >> >
> > >>
> >
> file:/Users/stana/.m2/repository/org/apache/hive/hive-exec/2.0.0/hive-exec-2.0.0.jar
> > >> > does not exist
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl: Failing this attempt.
> Failing
> > >> > the application.
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:          ApplicationMaster
> > host:
> > >> > N/A
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:          ApplicationMaster
> RPC
> > >> > port: -1
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:          queue: default
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:          start time:
> > >> 1457180833494
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:          final status:
> FAILED
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:          tracking URL:
> > >> > http://storm0:8088/cluster/app/application_1457002628102_0043
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:          user: stana
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl: Exception in thread "main"
> > >> > org.apache.spark.SparkException: Application
> > >> > application_1457002628102_0043 finished with failed status
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > >> > org.apache.spark.deploy.yarn.Client.run(Client.scala:920)
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > >> > org.apache.spark.deploy.yarn.Client$.main(Client.scala:966)
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > >> > org.apache.spark.deploy.yarn.Client.main(Client.scala)
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > >> > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > >> >
> > >> >
> > >>
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > >> >
> > >> >
> > >>
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > >> > java.lang.reflect.Method.invoke(Method.java:606)
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > >> >
> > >> >
> > >>
> >
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > >> >
> > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > >> > org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > >> > org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > >> > org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO
> > >> > ShutdownHookManager: Shutdown hook called
> > >> > 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO
> > >> > ShutdownHookManager: Deleting directory
> > >> >
> > >> >
> > >>
> >
> /private/var/folders/vt/cjcdhms903x7brn1kbh558s40000gn/T/spark-5b92ce20-b6f8-4832-8b15-5e98bd0e0705
> > >> > 16/03/10 18:33:10 WARN SparkClientImpl: Error while waiting for
> client
> > >> > to connect.
> > >> > java.util.concurrent.ExecutionException: java.lang.RuntimeException:
> > >> > Cancel client '5bda93c0-865b-48a8-b368-c2fcc30e81e8'. Error: Child
> > >> > process exited before connecting back
> > >> >         at
> > >> > io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
> > >> > ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:101)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:98)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:94)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:63)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:55)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:114)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:131)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:117)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runJoinOptimizations(SparkCompiler.java:181)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:119)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10195)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:229)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:239)
> > >> > [hive-exec-2.0.0.jar:2.0.0]
> > >> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:479)
> > >> > [hive-exec-2.0.0.jar:?]
> > >> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:319)
> > >> > [hive-exec-2.0.0.jar:?]
> > >> >         at
> > >> > org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1255)
> > >> > [hive-exec-2.0.0.jar:?]
> > >> >         at
> > >> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1301)
> > >> > [hive-exec-2.0.0.jar:?]
> > >> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1184)
> > >> > [hive-exec-2.0.0.jar:?]
> > >> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1172)
> > >> > [hive-exec-2.0.0.jar:?]
> > >> >         at
> > >> >
> org.apache.hadoop.hive.ql.TestHiveDriver.main(TestHiveDriver.java:41)
> > >> > [test-classes/:?]
> > >> > Caused by: java.lang.RuntimeException: Cancel client
> > >> > '5bda93c0-865b-48a8-b368-c2fcc30e81e8'. Error: Child process exited
> > >> > before connecting back
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hive.spark.client.rpc.RpcServer.cancelClient(RpcServer.java:179)
> > >> > ~[hive-exec-2.0.0.jar:2.0.0]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hive.spark.client.SparkClientImpl$3.run(SparkClientImpl.java:450)
> > >> > ~[hive-exec-2.0.0.jar:2.0.0]
> > >> >         at java.lang.Thread.run(Thread.java:745) ~[?:1.7.0_67]
> > >> > 16/03/10 18:33:10 WARN SparkClientImpl: Child process exited with
> code
> > >> 1.
> > >> > FAILED: SemanticException Failed to get a spark session:
> > >> > org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create
> > >> > spark client.
> > >> > 16/03/10 18:33:10 ERROR Driver: FAILED: SemanticException Failed to
> > >> > get a spark session:
> org.apache.hadoop.hive.ql.metadata.HiveException:
> > >> > Failed to create spark client.
> > >> > org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get a
> > >> > spark session: org.apache.hadoop.hive.ql.metadata.HiveException:
> > >> > Failed to create spark client.
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:121)
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runJoinOptimizations(SparkCompiler.java:181)
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:119)
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102)
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10195)
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:229)
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:239)
> > >> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:479)
> > >> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:319)
> > >> >         at
> > >> > org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1255)
> > >> >         at
> > >> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1301)
> > >> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1184)
> > >> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1172)
> > >> >         at
> > >> >
> org.apache.hadoop.hive.ql.TestHiveDriver.main(TestHiveDriver.java:41)
> > >> >
> > >>
> > >
> > >
> >
>

Re: Error in Hive on Spark

Posted by Xuefu Zhang <xu...@uber.com>.
Stana,

I'm not sure if I fully understand the problem. spark-submit is launched in
the same host as your application, which should be able to access
hive-exec.jar. Yarn cluster needs the jar also, but HS2 or Hive CLI will
take care of that. Since you are not using either of which, then, it's your
application's responsibility to make that happen.

Did I missed anything else?

Thanks,
Xuefu

On Sun, Mar 20, 2016 at 11:18 PM, Stana <st...@is-land.com.tw> wrote:

> Does anyone have suggestions in setting property of hive-exec-2.0.0.jar
> path in application?
> Something like
>
> 'hiveConf.set("hive.remote.driver.jar","hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar")'.
>
>
>
> 2016-03-11 10:53 GMT+08:00 Stana <st...@is-land.com.tw>:
>
> > Thanks for reply
> >
> > I have set the property spark.home in my application. Otherwise the
> > application threw 'SPARK_HOME not found exception'.
> >
> > I found hive source code in SparkClientImpl.java:
> >
> > private Thread startDriver(final RpcServer rpcServer, final String
> > clientId, final String secret)
> >       throws IOException {
> > ...
> >
> > List<String> argv = Lists.newArrayList();
> >
> > ...
> >
> > argv.add("--class");
> > argv.add(RemoteDriver.class.getName());
> >
> > String jar = "spark-internal";
> > if (SparkContext.jarOfClass(this.getClass()).isDefined()) {
> > jar = SparkContext.jarOfClass(this.getClass()).get();
> > }
> > argv.add(jar);
> >
> > ...
> >
> > }
> >
> > When hive executed spark-submit , it generate the shell command with
> > --class org.apache.hive.spark.client.RemoteDriver ,and set jar path with
> > SparkContext.jarOfClass(this.getClass()).get(). It will get the local
> path
> > of hive-exec-2.0.0.jar.
> >
> > In my situation, the application and yarn cluster are in different
> cluster.
> > When application executed spark-submit with local path of
> > hive-exec-2.0.0.jar to yarn cluster, there 's no hive-exec-2.0.0.jar in
> > yarn cluster. Then application threw the exception: "hive-exec-2.0.0.jar
> >   does not exist ...".
> >
> > Can it be set property of hive-exec-2.0.0.jar path in application ?
> > Something like 'hiveConf.set("hive.remote.driver.jar",
> > "hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar")'.
> > If not, is it possible to achieve in the future version?
> >
> >
> >
> >
> > 2016-03-10 23:51 GMT+08:00 Xuefu Zhang <xu...@uber.com>:
> >
> >> You can probably avoid the problem by set environment variable
> SPARK_HOME
> >> or JVM property spark.home that points to your spark installation.
> >>
> >> --Xuefu
> >>
> >> On Thu, Mar 10, 2016 at 3:11 AM, Stana <st...@is-land.com.tw> wrote:
> >>
> >> >  I am trying out Hive on Spark with hive 2.0.0 and spark 1.4.1, and
> >> > executing org.apache.hadoop.hive.ql.Driver with java application.
> >> >
> >> > Following are my situations:
> >> > 1.Building spark 1.4.1 assembly jar without Hive .
> >> > 2.Uploading the spark assembly jar to the hadoop cluster.
> >> > 3.Executing the java application with eclipse IDE in my client
> computer.
> >> >
> >> > The application went well and it submitted mr job to the yarn cluster
> >> > successfully when using " hiveConf.set("hive.execution.engine", "mr")
> >> > ",but it threw exceptions in spark-engine.
> >> >
> >> > Finally, i traced Hive source code and came to the conclusion:
> >> >
> >> > In my situation, SparkClientImpl class will generate the spark-submit
> >> > shell and executed it.
> >> > The shell command allocated  --class with RemoteDriver.class.getName()
> >> > and jar with SparkContext.jarOfClass(this.getClass()).get(), so that
> >> > my application threw the exception.
> >> >
> >> > Is it right? And how can I do to execute the application with
> >> > spark-engine successfully in my client computer ? Thanks a lot!
> >> >
> >> >
> >> > Java application code:
> >> >
> >> > public class TestHiveDriver {
> >> >
> >> >         private static HiveConf hiveConf;
> >> >         private static Driver driver;
> >> >         private static CliSessionState ss;
> >> >         public static void main(String[] args){
> >> >
> >> >                 String sql = "select * from hadoop0263_0 as a join
> >> > hadoop0263_0 as b
> >> > on (a.key = b.key)";
> >> >                 ss = new CliSessionState(new
> >> HiveConf(SessionState.class));
> >> >                 hiveConf = new HiveConf(Driver.class);
> >> >                 hiveConf.set("fs.default.name",
> "hdfs://storm0:9000");
> >> >                 hiveConf.set("yarn.resourcemanager.address",
> >> > "storm0:8032");
> >> >                 hiveConf.set("yarn.resourcemanager.scheduler.address",
> >> > "storm0:8030");
> >> >
> >> >
> >>
> hiveConf.set("yarn.resourcemanager.resource-tracker.address","storm0:8031");
> >> >                 hiveConf.set("yarn.resourcemanager.admin.address",
> >> > "storm0:8033");
> >> >                 hiveConf.set("mapreduce.framework.name", "yarn");
> >> >                 hiveConf.set("mapreduce.johistory.address",
> >> > "storm0:10020");
> >> >
> >> >
> >>
> hiveConf.set("javax.jdo.option.ConnectionURL","jdbc:mysql://storm0:3306/stana_metastore");
> >> >
> >> >
> >>
> hiveConf.set("javax.jdo.option.ConnectionDriverName","com.mysql.jdbc.Driver");
> >> >                 hiveConf.set("javax.jdo.option.ConnectionUserName",
> >> > "root");
> >> >                 hiveConf.set("javax.jdo.option.ConnectionPassword",
> >> > "123456");
> >> >                 hiveConf.setBoolean("hive.auto.convert.join",false);
> >> >                 hiveConf.set("spark.yarn.jar",
> >> > "hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar");
> >> >                 hiveConf.set("spark.home","target/spark");
> >> >                 hiveConf.set("hive.execution.engine", "spark");
> >> >                 hiveConf.set("hive.dbname", "default");
> >> >
> >> >
> >> >                 driver = new Driver(hiveConf);
> >> >                 SessionState.start(hiveConf);
> >> >
> >> >                 CommandProcessorResponse res = null;
> >> >                 try {
> >> >                         res = driver.run(sql);
> >> >                 } catch (CommandNeedRetryException e) {
> >> >                         // TODO Auto-generated catch block
> >> >                         e.printStackTrace();
> >> >                 }
> >> >
> >> >                 System.out.println("Response Code:" +
> >> > res.getResponseCode());
> >> >                 System.out.println("Error Message:" +
> >> > res.getErrorMessage());
> >> >                 System.out.println("SQL State:" + res.getSQLState());
> >> >
> >> >         }
> >> > }
> >> >
> >> >
> >> >
> >> >
> >> > Exception of spark-engine:
> >> >
> >> > 16/03/10 18:32:58 INFO SparkClientImpl: Running client driver with
> >> > argv:
> >> >
> >>
> /Volumes/Sdhd/Documents/project/island/java/apache/hive-200-test/hive-release-2.0.0/itests/hive-unit/target/spark/bin/spark-submit
> >> > --properties-file
> >> >
> >> >
> >>
> /var/folders/vt/cjcdhms903x7brn1kbh558s40000gn/T/spark-submit.7697089826296920539.properties
> >> > --class org.apache.hive.spark.client.RemoteDriver
> >> >
> >> >
> >>
> /Users/stana/.m2/repository/org/apache/hive/hive-exec/2.0.0/hive-exec-2.0.0.jar
> >> > --remote-host MacBook-Pro.local --remote-port 51331 --conf
> >> > hive.spark.client.connect.timeout=1000 --conf
> >> > hive.spark.client.server.connect.timeout=90000 --conf
> >> > hive.spark.client.channel.log.level=null --conf
> >> > hive.spark.client.rpc.max.size=52428800 --conf
> >> > hive.spark.client.rpc.threads=8 --conf
> >> > hive.spark.client.secret.bits=256
> >> > 16/03/10 18:33:09 INFO SparkClientImpl: 16/03/10 18:33:09 INFO Client:
> >> > 16/03/10 18:33:09 INFO SparkClientImpl:          client token: N/A
> >> > 16/03/10 18:33:09 INFO SparkClientImpl:          diagnostics: N/A
> >> > 16/03/10 18:33:09 INFO SparkClientImpl:          ApplicationMaster
> host:
> >> > N/A
> >> > 16/03/10 18:33:09 INFO SparkClientImpl:          ApplicationMaster RPC
> >> > port: -1
> >> > 16/03/10 18:33:09 INFO SparkClientImpl:          queue: default
> >> > 16/03/10 18:33:09 INFO SparkClientImpl:          start time:
> >> 1457180833494
> >> > 16/03/10 18:33:09 INFO SparkClientImpl:          final status:
> UNDEFINED
> >> > 16/03/10 18:33:09 INFO SparkClientImpl:          tracking URL:
> >> > http://storm0:8088/proxy/application_1457002628102_0043/
> >> > 16/03/10 18:33:09 INFO SparkClientImpl:          user: stana
> >> > 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO Client:
> >> > Application report for application_1457002628102_0043 (state: FAILED)
> >> > 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO Client:
> >> > 16/03/10 18:33:10 INFO SparkClientImpl:          client token: N/A
> >> > 16/03/10 18:33:10 INFO SparkClientImpl:          diagnostics:
> >> Application
> >> > application_1457002628102_0043 failed 1 times due to AM Container for
> >> > appattempt_1457002628102_0043_000001 exited with  exitCode: -1000
> >> > 16/03/10 18:33:10 INFO SparkClientImpl: For more detailed output,
> >> > check application tracking
> >> > page:http://storm0:8088/proxy/application_1457002628102_0043/Then,
> >> > click on links to logs of each attempt.
> >> > 16/03/10 18:33:10 INFO SparkClientImpl: Diagnostics:
> >> > java.io.FileNotFoundException: File
> >> >
> >> >
> >>
> file:/Users/stana/.m2/repository/org/apache/hive/hive-exec/2.0.0/hive-exec-2.0.0.jar
> >> > does not exist
> >> > 16/03/10 18:33:10 INFO SparkClientImpl: Failing this attempt. Failing
> >> > the application.
> >> > 16/03/10 18:33:10 INFO SparkClientImpl:          ApplicationMaster
> host:
> >> > N/A
> >> > 16/03/10 18:33:10 INFO SparkClientImpl:          ApplicationMaster RPC
> >> > port: -1
> >> > 16/03/10 18:33:10 INFO SparkClientImpl:          queue: default
> >> > 16/03/10 18:33:10 INFO SparkClientImpl:          start time:
> >> 1457180833494
> >> > 16/03/10 18:33:10 INFO SparkClientImpl:          final status: FAILED
> >> > 16/03/10 18:33:10 INFO SparkClientImpl:          tracking URL:
> >> > http://storm0:8088/cluster/app/application_1457002628102_0043
> >> > 16/03/10 18:33:10 INFO SparkClientImpl:          user: stana
> >> > 16/03/10 18:33:10 INFO SparkClientImpl: Exception in thread "main"
> >> > org.apache.spark.SparkException: Application
> >> > application_1457002628102_0043 finished with failed status
> >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> >> > org.apache.spark.deploy.yarn.Client.run(Client.scala:920)
> >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> >> > org.apache.spark.deploy.yarn.Client$.main(Client.scala:966)
> >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> >> > org.apache.spark.deploy.yarn.Client.main(Client.scala)
> >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> >> > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> >> >
> >> >
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> >> >
> >> >
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> >> > java.lang.reflect.Method.invoke(Method.java:606)
> >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> >> >
> >> >
> >>
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
> >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> >> >
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
> >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> >> > org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
> >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> >> > org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
> >> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> >> > org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> >> > 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO
> >> > ShutdownHookManager: Shutdown hook called
> >> > 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO
> >> > ShutdownHookManager: Deleting directory
> >> >
> >> >
> >>
> /private/var/folders/vt/cjcdhms903x7brn1kbh558s40000gn/T/spark-5b92ce20-b6f8-4832-8b15-5e98bd0e0705
> >> > 16/03/10 18:33:10 WARN SparkClientImpl: Error while waiting for client
> >> > to connect.
> >> > java.util.concurrent.ExecutionException: java.lang.RuntimeException:
> >> > Cancel client '5bda93c0-865b-48a8-b368-c2fcc30e81e8'. Error: Child
> >> > process exited before connecting back
> >> >         at
> >> > io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
> >> > ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> >> >         at
> >> >
> >>
> org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:101)
> >> > [hive-exec-2.0.0.jar:2.0.0]
> >> >         at
> >> >
> >>
> org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80)
> >> > [hive-exec-2.0.0.jar:2.0.0]
> >> >         at
> >> >
> >>
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:98)
> >> > [hive-exec-2.0.0.jar:2.0.0]
> >> >         at
> >> >
> >>
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:94)
> >> > [hive-exec-2.0.0.jar:2.0.0]
> >> >         at
> >> >
> >>
> org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:63)
> >> > [hive-exec-2.0.0.jar:2.0.0]
> >> >         at
> >> >
> >>
> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:55)
> >> > [hive-exec-2.0.0.jar:2.0.0]
> >> >         at
> >> >
> >>
> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:114)
> >> > [hive-exec-2.0.0.jar:2.0.0]
> >> >         at
> >> >
> >>
> org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:131)
> >> > [hive-exec-2.0.0.jar:2.0.0]
> >> >         at
> >> >
> >>
> org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:117)
> >> > [hive-exec-2.0.0.jar:2.0.0]
> >> >         at
> >> >
> >>
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> >> > [hive-exec-2.0.0.jar:2.0.0]
> >> >         at
> >> >
> >>
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> >> > [hive-exec-2.0.0.jar:2.0.0]
> >> >         at
> >> >
> >>
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
> >> > [hive-exec-2.0.0.jar:2.0.0]
> >> >         at
> >> >
> >>
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
> >> > [hive-exec-2.0.0.jar:2.0.0]
> >> >         at
> >> >
> >>
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
> >> > [hive-exec-2.0.0.jar:2.0.0]
> >> >         at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runJoinOptimizations(SparkCompiler.java:181)
> >> > [hive-exec-2.0.0.jar:2.0.0]
> >> >         at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:119)
> >> > [hive-exec-2.0.0.jar:2.0.0]
> >> >         at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102)
> >> > [hive-exec-2.0.0.jar:2.0.0]
> >> >         at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10195)
> >> > [hive-exec-2.0.0.jar:2.0.0]
> >> >         at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:229)
> >> > [hive-exec-2.0.0.jar:2.0.0]
> >> >         at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:239)
> >> > [hive-exec-2.0.0.jar:2.0.0]
> >> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:479)
> >> > [hive-exec-2.0.0.jar:?]
> >> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:319)
> >> > [hive-exec-2.0.0.jar:?]
> >> >         at
> >> > org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1255)
> >> > [hive-exec-2.0.0.jar:?]
> >> >         at
> >> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1301)
> >> > [hive-exec-2.0.0.jar:?]
> >> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1184)
> >> > [hive-exec-2.0.0.jar:?]
> >> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1172)
> >> > [hive-exec-2.0.0.jar:?]
> >> >         at
> >> > org.apache.hadoop.hive.ql.TestHiveDriver.main(TestHiveDriver.java:41)
> >> > [test-classes/:?]
> >> > Caused by: java.lang.RuntimeException: Cancel client
> >> > '5bda93c0-865b-48a8-b368-c2fcc30e81e8'. Error: Child process exited
> >> > before connecting back
> >> >         at
> >> >
> >>
> org.apache.hive.spark.client.rpc.RpcServer.cancelClient(RpcServer.java:179)
> >> > ~[hive-exec-2.0.0.jar:2.0.0]
> >> >         at
> >> >
> >>
> org.apache.hive.spark.client.SparkClientImpl$3.run(SparkClientImpl.java:450)
> >> > ~[hive-exec-2.0.0.jar:2.0.0]
> >> >         at java.lang.Thread.run(Thread.java:745) ~[?:1.7.0_67]
> >> > 16/03/10 18:33:10 WARN SparkClientImpl: Child process exited with code
> >> 1.
> >> > FAILED: SemanticException Failed to get a spark session:
> >> > org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create
> >> > spark client.
> >> > 16/03/10 18:33:10 ERROR Driver: FAILED: SemanticException Failed to
> >> > get a spark session: org.apache.hadoop.hive.ql.metadata.HiveException:
> >> > Failed to create spark client.
> >> > org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get a
> >> > spark session: org.apache.hadoop.hive.ql.metadata.HiveException:
> >> > Failed to create spark client.
> >> >         at
> >> >
> >>
> org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:121)
> >> >         at
> >> >
> >>
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> >> >         at
> >> >
> >>
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> >> >         at
> >> >
> >>
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
> >> >         at
> >> >
> >>
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
> >> >         at
> >> >
> >>
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
> >> >         at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runJoinOptimizations(SparkCompiler.java:181)
> >> >         at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:119)
> >> >         at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102)
> >> >         at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10195)
> >> >         at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:229)
> >> >         at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:239)
> >> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:479)
> >> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:319)
> >> >         at
> >> > org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1255)
> >> >         at
> >> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1301)
> >> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1184)
> >> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1172)
> >> >         at
> >> > org.apache.hadoop.hive.ql.TestHiveDriver.main(TestHiveDriver.java:41)
> >> >
> >>
> >
> >
>

Re: Error in Hive on Spark

Posted by Stana <st...@is-land.com.tw>.
Does anyone have suggestions in setting property of hive-exec-2.0.0.jar
path in application?
Something like
'hiveConf.set("hive.remote.driver.jar","hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar")'.



2016-03-11 10:53 GMT+08:00 Stana <st...@is-land.com.tw>:

> Thanks for reply
>
> I have set the property spark.home in my application. Otherwise the
> application threw 'SPARK_HOME not found exception'.
>
> I found hive source code in SparkClientImpl.java:
>
> private Thread startDriver(final RpcServer rpcServer, final String
> clientId, final String secret)
>       throws IOException {
> ...
>
> List<String> argv = Lists.newArrayList();
>
> ...
>
> argv.add("--class");
> argv.add(RemoteDriver.class.getName());
>
> String jar = "spark-internal";
> if (SparkContext.jarOfClass(this.getClass()).isDefined()) {
> jar = SparkContext.jarOfClass(this.getClass()).get();
> }
> argv.add(jar);
>
> ...
>
> }
>
> When hive executed spark-submit , it generate the shell command with
> --class org.apache.hive.spark.client.RemoteDriver ,and set jar path with
> SparkContext.jarOfClass(this.getClass()).get(). It will get the local path
> of hive-exec-2.0.0.jar.
>
> In my situation, the application and yarn cluster are in different cluster.
> When application executed spark-submit with local path of
> hive-exec-2.0.0.jar to yarn cluster, there 's no hive-exec-2.0.0.jar in
> yarn cluster. Then application threw the exception: "hive-exec-2.0.0.jar
>   does not exist ...".
>
> Can it be set property of hive-exec-2.0.0.jar path in application ?
> Something like 'hiveConf.set("hive.remote.driver.jar",
> "hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar")'.
> If not, is it possible to achieve in the future version?
>
>
>
>
> 2016-03-10 23:51 GMT+08:00 Xuefu Zhang <xu...@uber.com>:
>
>> You can probably avoid the problem by set environment variable SPARK_HOME
>> or JVM property spark.home that points to your spark installation.
>>
>> --Xuefu
>>
>> On Thu, Mar 10, 2016 at 3:11 AM, Stana <st...@is-land.com.tw> wrote:
>>
>> >  I am trying out Hive on Spark with hive 2.0.0 and spark 1.4.1, and
>> > executing org.apache.hadoop.hive.ql.Driver with java application.
>> >
>> > Following are my situations:
>> > 1.Building spark 1.4.1 assembly jar without Hive .
>> > 2.Uploading the spark assembly jar to the hadoop cluster.
>> > 3.Executing the java application with eclipse IDE in my client computer.
>> >
>> > The application went well and it submitted mr job to the yarn cluster
>> > successfully when using " hiveConf.set("hive.execution.engine", "mr")
>> > ",but it threw exceptions in spark-engine.
>> >
>> > Finally, i traced Hive source code and came to the conclusion:
>> >
>> > In my situation, SparkClientImpl class will generate the spark-submit
>> > shell and executed it.
>> > The shell command allocated  --class with RemoteDriver.class.getName()
>> > and jar with SparkContext.jarOfClass(this.getClass()).get(), so that
>> > my application threw the exception.
>> >
>> > Is it right? And how can I do to execute the application with
>> > spark-engine successfully in my client computer ? Thanks a lot!
>> >
>> >
>> > Java application code:
>> >
>> > public class TestHiveDriver {
>> >
>> >         private static HiveConf hiveConf;
>> >         private static Driver driver;
>> >         private static CliSessionState ss;
>> >         public static void main(String[] args){
>> >
>> >                 String sql = "select * from hadoop0263_0 as a join
>> > hadoop0263_0 as b
>> > on (a.key = b.key)";
>> >                 ss = new CliSessionState(new
>> HiveConf(SessionState.class));
>> >                 hiveConf = new HiveConf(Driver.class);
>> >                 hiveConf.set("fs.default.name", "hdfs://storm0:9000");
>> >                 hiveConf.set("yarn.resourcemanager.address",
>> > "storm0:8032");
>> >                 hiveConf.set("yarn.resourcemanager.scheduler.address",
>> > "storm0:8030");
>> >
>> >
>> hiveConf.set("yarn.resourcemanager.resource-tracker.address","storm0:8031");
>> >                 hiveConf.set("yarn.resourcemanager.admin.address",
>> > "storm0:8033");
>> >                 hiveConf.set("mapreduce.framework.name", "yarn");
>> >                 hiveConf.set("mapreduce.johistory.address",
>> > "storm0:10020");
>> >
>> >
>> hiveConf.set("javax.jdo.option.ConnectionURL","jdbc:mysql://storm0:3306/stana_metastore");
>> >
>> >
>> hiveConf.set("javax.jdo.option.ConnectionDriverName","com.mysql.jdbc.Driver");
>> >                 hiveConf.set("javax.jdo.option.ConnectionUserName",
>> > "root");
>> >                 hiveConf.set("javax.jdo.option.ConnectionPassword",
>> > "123456");
>> >                 hiveConf.setBoolean("hive.auto.convert.join",false);
>> >                 hiveConf.set("spark.yarn.jar",
>> > "hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar");
>> >                 hiveConf.set("spark.home","target/spark");
>> >                 hiveConf.set("hive.execution.engine", "spark");
>> >                 hiveConf.set("hive.dbname", "default");
>> >
>> >
>> >                 driver = new Driver(hiveConf);
>> >                 SessionState.start(hiveConf);
>> >
>> >                 CommandProcessorResponse res = null;
>> >                 try {
>> >                         res = driver.run(sql);
>> >                 } catch (CommandNeedRetryException e) {
>> >                         // TODO Auto-generated catch block
>> >                         e.printStackTrace();
>> >                 }
>> >
>> >                 System.out.println("Response Code:" +
>> > res.getResponseCode());
>> >                 System.out.println("Error Message:" +
>> > res.getErrorMessage());
>> >                 System.out.println("SQL State:" + res.getSQLState());
>> >
>> >         }
>> > }
>> >
>> >
>> >
>> >
>> > Exception of spark-engine:
>> >
>> > 16/03/10 18:32:58 INFO SparkClientImpl: Running client driver with
>> > argv:
>> >
>> /Volumes/Sdhd/Documents/project/island/java/apache/hive-200-test/hive-release-2.0.0/itests/hive-unit/target/spark/bin/spark-submit
>> > --properties-file
>> >
>> >
>> /var/folders/vt/cjcdhms903x7brn1kbh558s40000gn/T/spark-submit.7697089826296920539.properties
>> > --class org.apache.hive.spark.client.RemoteDriver
>> >
>> >
>> /Users/stana/.m2/repository/org/apache/hive/hive-exec/2.0.0/hive-exec-2.0.0.jar
>> > --remote-host MacBook-Pro.local --remote-port 51331 --conf
>> > hive.spark.client.connect.timeout=1000 --conf
>> > hive.spark.client.server.connect.timeout=90000 --conf
>> > hive.spark.client.channel.log.level=null --conf
>> > hive.spark.client.rpc.max.size=52428800 --conf
>> > hive.spark.client.rpc.threads=8 --conf
>> > hive.spark.client.secret.bits=256
>> > 16/03/10 18:33:09 INFO SparkClientImpl: 16/03/10 18:33:09 INFO Client:
>> > 16/03/10 18:33:09 INFO SparkClientImpl:          client token: N/A
>> > 16/03/10 18:33:09 INFO SparkClientImpl:          diagnostics: N/A
>> > 16/03/10 18:33:09 INFO SparkClientImpl:          ApplicationMaster host:
>> > N/A
>> > 16/03/10 18:33:09 INFO SparkClientImpl:          ApplicationMaster RPC
>> > port: -1
>> > 16/03/10 18:33:09 INFO SparkClientImpl:          queue: default
>> > 16/03/10 18:33:09 INFO SparkClientImpl:          start time:
>> 1457180833494
>> > 16/03/10 18:33:09 INFO SparkClientImpl:          final status: UNDEFINED
>> > 16/03/10 18:33:09 INFO SparkClientImpl:          tracking URL:
>> > http://storm0:8088/proxy/application_1457002628102_0043/
>> > 16/03/10 18:33:09 INFO SparkClientImpl:          user: stana
>> > 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO Client:
>> > Application report for application_1457002628102_0043 (state: FAILED)
>> > 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO Client:
>> > 16/03/10 18:33:10 INFO SparkClientImpl:          client token: N/A
>> > 16/03/10 18:33:10 INFO SparkClientImpl:          diagnostics:
>> Application
>> > application_1457002628102_0043 failed 1 times due to AM Container for
>> > appattempt_1457002628102_0043_000001 exited with  exitCode: -1000
>> > 16/03/10 18:33:10 INFO SparkClientImpl: For more detailed output,
>> > check application tracking
>> > page:http://storm0:8088/proxy/application_1457002628102_0043/Then,
>> > click on links to logs of each attempt.
>> > 16/03/10 18:33:10 INFO SparkClientImpl: Diagnostics:
>> > java.io.FileNotFoundException: File
>> >
>> >
>> file:/Users/stana/.m2/repository/org/apache/hive/hive-exec/2.0.0/hive-exec-2.0.0.jar
>> > does not exist
>> > 16/03/10 18:33:10 INFO SparkClientImpl: Failing this attempt. Failing
>> > the application.
>> > 16/03/10 18:33:10 INFO SparkClientImpl:          ApplicationMaster host:
>> > N/A
>> > 16/03/10 18:33:10 INFO SparkClientImpl:          ApplicationMaster RPC
>> > port: -1
>> > 16/03/10 18:33:10 INFO SparkClientImpl:          queue: default
>> > 16/03/10 18:33:10 INFO SparkClientImpl:          start time:
>> 1457180833494
>> > 16/03/10 18:33:10 INFO SparkClientImpl:          final status: FAILED
>> > 16/03/10 18:33:10 INFO SparkClientImpl:          tracking URL:
>> > http://storm0:8088/cluster/app/application_1457002628102_0043
>> > 16/03/10 18:33:10 INFO SparkClientImpl:          user: stana
>> > 16/03/10 18:33:10 INFO SparkClientImpl: Exception in thread "main"
>> > org.apache.spark.SparkException: Application
>> > application_1457002628102_0043 finished with failed status
>> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
>> > org.apache.spark.deploy.yarn.Client.run(Client.scala:920)
>> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
>> > org.apache.spark.deploy.yarn.Client$.main(Client.scala:966)
>> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
>> > org.apache.spark.deploy.yarn.Client.main(Client.scala)
>> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
>> > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
>> >
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
>> >
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
>> > java.lang.reflect.Method.invoke(Method.java:606)
>> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
>> >
>> >
>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
>> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
>> > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
>> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
>> > org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
>> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
>> > org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
>> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
>> > org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>> > 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO
>> > ShutdownHookManager: Shutdown hook called
>> > 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO
>> > ShutdownHookManager: Deleting directory
>> >
>> >
>> /private/var/folders/vt/cjcdhms903x7brn1kbh558s40000gn/T/spark-5b92ce20-b6f8-4832-8b15-5e98bd0e0705
>> > 16/03/10 18:33:10 WARN SparkClientImpl: Error while waiting for client
>> > to connect.
>> > java.util.concurrent.ExecutionException: java.lang.RuntimeException:
>> > Cancel client '5bda93c0-865b-48a8-b368-c2fcc30e81e8'. Error: Child
>> > process exited before connecting back
>> >         at
>> > io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
>> > ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>> >         at
>> >
>> org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:101)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:98)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:94)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:63)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:55)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:114)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:131)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:117)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runJoinOptimizations(SparkCompiler.java:181)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:119)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10195)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:229)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:239)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:479)
>> > [hive-exec-2.0.0.jar:?]
>> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:319)
>> > [hive-exec-2.0.0.jar:?]
>> >         at
>> > org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1255)
>> > [hive-exec-2.0.0.jar:?]
>> >         at
>> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1301)
>> > [hive-exec-2.0.0.jar:?]
>> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1184)
>> > [hive-exec-2.0.0.jar:?]
>> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1172)
>> > [hive-exec-2.0.0.jar:?]
>> >         at
>> > org.apache.hadoop.hive.ql.TestHiveDriver.main(TestHiveDriver.java:41)
>> > [test-classes/:?]
>> > Caused by: java.lang.RuntimeException: Cancel client
>> > '5bda93c0-865b-48a8-b368-c2fcc30e81e8'. Error: Child process exited
>> > before connecting back
>> >         at
>> >
>> org.apache.hive.spark.client.rpc.RpcServer.cancelClient(RpcServer.java:179)
>> > ~[hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hive.spark.client.SparkClientImpl$3.run(SparkClientImpl.java:450)
>> > ~[hive-exec-2.0.0.jar:2.0.0]
>> >         at java.lang.Thread.run(Thread.java:745) ~[?:1.7.0_67]
>> > 16/03/10 18:33:10 WARN SparkClientImpl: Child process exited with code
>> 1.
>> > FAILED: SemanticException Failed to get a spark session:
>> > org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create
>> > spark client.
>> > 16/03/10 18:33:10 ERROR Driver: FAILED: SemanticException Failed to
>> > get a spark session: org.apache.hadoop.hive.ql.metadata.HiveException:
>> > Failed to create spark client.
>> > org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get a
>> > spark session: org.apache.hadoop.hive.ql.metadata.HiveException:
>> > Failed to create spark client.
>> >         at
>> >
>> org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:121)
>> >         at
>> >
>> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>> >         at
>> >
>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>> >         at
>> >
>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>> >         at
>> >
>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
>> >         at
>> >
>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
>> >         at
>> >
>> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runJoinOptimizations(SparkCompiler.java:181)
>> >         at
>> >
>> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:119)
>> >         at
>> >
>> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102)
>> >         at
>> >
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10195)
>> >         at
>> >
>> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:229)
>> >         at
>> >
>> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:239)
>> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:479)
>> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:319)
>> >         at
>> > org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1255)
>> >         at
>> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1301)
>> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1184)
>> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1172)
>> >         at
>> > org.apache.hadoop.hive.ql.TestHiveDriver.main(TestHiveDriver.java:41)
>> >
>>
>
>

Re: Error in Hive on Spark

Posted by Stana <st...@is-land.com.tw>.
Does anyone have suggestions in setting property of hive-exec-2.0.0.jar
path in application?
Something like
'hiveConf.set("hive.remote.driver.jar","hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar")'.



2016-03-11 10:53 GMT+08:00 Stana <st...@is-land.com.tw>:

> Thanks for reply
>
> I have set the property spark.home in my application. Otherwise the
> application threw 'SPARK_HOME not found exception'.
>
> I found hive source code in SparkClientImpl.java:
>
> private Thread startDriver(final RpcServer rpcServer, final String
> clientId, final String secret)
>       throws IOException {
> ...
>
> List<String> argv = Lists.newArrayList();
>
> ...
>
> argv.add("--class");
> argv.add(RemoteDriver.class.getName());
>
> String jar = "spark-internal";
> if (SparkContext.jarOfClass(this.getClass()).isDefined()) {
> jar = SparkContext.jarOfClass(this.getClass()).get();
> }
> argv.add(jar);
>
> ...
>
> }
>
> When hive executed spark-submit , it generate the shell command with
> --class org.apache.hive.spark.client.RemoteDriver ,and set jar path with
> SparkContext.jarOfClass(this.getClass()).get(). It will get the local path
> of hive-exec-2.0.0.jar.
>
> In my situation, the application and yarn cluster are in different cluster.
> When application executed spark-submit with local path of
> hive-exec-2.0.0.jar to yarn cluster, there 's no hive-exec-2.0.0.jar in
> yarn cluster. Then application threw the exception: "hive-exec-2.0.0.jar
>   does not exist ...".
>
> Can it be set property of hive-exec-2.0.0.jar path in application ?
> Something like 'hiveConf.set("hive.remote.driver.jar",
> "hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar")'.
> If not, is it possible to achieve in the future version?
>
>
>
>
> 2016-03-10 23:51 GMT+08:00 Xuefu Zhang <xu...@uber.com>:
>
>> You can probably avoid the problem by set environment variable SPARK_HOME
>> or JVM property spark.home that points to your spark installation.
>>
>> --Xuefu
>>
>> On Thu, Mar 10, 2016 at 3:11 AM, Stana <st...@is-land.com.tw> wrote:
>>
>> >  I am trying out Hive on Spark with hive 2.0.0 and spark 1.4.1, and
>> > executing org.apache.hadoop.hive.ql.Driver with java application.
>> >
>> > Following are my situations:
>> > 1.Building spark 1.4.1 assembly jar without Hive .
>> > 2.Uploading the spark assembly jar to the hadoop cluster.
>> > 3.Executing the java application with eclipse IDE in my client computer.
>> >
>> > The application went well and it submitted mr job to the yarn cluster
>> > successfully when using " hiveConf.set("hive.execution.engine", "mr")
>> > ",but it threw exceptions in spark-engine.
>> >
>> > Finally, i traced Hive source code and came to the conclusion:
>> >
>> > In my situation, SparkClientImpl class will generate the spark-submit
>> > shell and executed it.
>> > The shell command allocated  --class with RemoteDriver.class.getName()
>> > and jar with SparkContext.jarOfClass(this.getClass()).get(), so that
>> > my application threw the exception.
>> >
>> > Is it right? And how can I do to execute the application with
>> > spark-engine successfully in my client computer ? Thanks a lot!
>> >
>> >
>> > Java application code:
>> >
>> > public class TestHiveDriver {
>> >
>> >         private static HiveConf hiveConf;
>> >         private static Driver driver;
>> >         private static CliSessionState ss;
>> >         public static void main(String[] args){
>> >
>> >                 String sql = "select * from hadoop0263_0 as a join
>> > hadoop0263_0 as b
>> > on (a.key = b.key)";
>> >                 ss = new CliSessionState(new
>> HiveConf(SessionState.class));
>> >                 hiveConf = new HiveConf(Driver.class);
>> >                 hiveConf.set("fs.default.name", "hdfs://storm0:9000");
>> >                 hiveConf.set("yarn.resourcemanager.address",
>> > "storm0:8032");
>> >                 hiveConf.set("yarn.resourcemanager.scheduler.address",
>> > "storm0:8030");
>> >
>> >
>> hiveConf.set("yarn.resourcemanager.resource-tracker.address","storm0:8031");
>> >                 hiveConf.set("yarn.resourcemanager.admin.address",
>> > "storm0:8033");
>> >                 hiveConf.set("mapreduce.framework.name", "yarn");
>> >                 hiveConf.set("mapreduce.johistory.address",
>> > "storm0:10020");
>> >
>> >
>> hiveConf.set("javax.jdo.option.ConnectionURL","jdbc:mysql://storm0:3306/stana_metastore");
>> >
>> >
>> hiveConf.set("javax.jdo.option.ConnectionDriverName","com.mysql.jdbc.Driver");
>> >                 hiveConf.set("javax.jdo.option.ConnectionUserName",
>> > "root");
>> >                 hiveConf.set("javax.jdo.option.ConnectionPassword",
>> > "123456");
>> >                 hiveConf.setBoolean("hive.auto.convert.join",false);
>> >                 hiveConf.set("spark.yarn.jar",
>> > "hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar");
>> >                 hiveConf.set("spark.home","target/spark");
>> >                 hiveConf.set("hive.execution.engine", "spark");
>> >                 hiveConf.set("hive.dbname", "default");
>> >
>> >
>> >                 driver = new Driver(hiveConf);
>> >                 SessionState.start(hiveConf);
>> >
>> >                 CommandProcessorResponse res = null;
>> >                 try {
>> >                         res = driver.run(sql);
>> >                 } catch (CommandNeedRetryException e) {
>> >                         // TODO Auto-generated catch block
>> >                         e.printStackTrace();
>> >                 }
>> >
>> >                 System.out.println("Response Code:" +
>> > res.getResponseCode());
>> >                 System.out.println("Error Message:" +
>> > res.getErrorMessage());
>> >                 System.out.println("SQL State:" + res.getSQLState());
>> >
>> >         }
>> > }
>> >
>> >
>> >
>> >
>> > Exception of spark-engine:
>> >
>> > 16/03/10 18:32:58 INFO SparkClientImpl: Running client driver with
>> > argv:
>> >
>> /Volumes/Sdhd/Documents/project/island/java/apache/hive-200-test/hive-release-2.0.0/itests/hive-unit/target/spark/bin/spark-submit
>> > --properties-file
>> >
>> >
>> /var/folders/vt/cjcdhms903x7brn1kbh558s40000gn/T/spark-submit.7697089826296920539.properties
>> > --class org.apache.hive.spark.client.RemoteDriver
>> >
>> >
>> /Users/stana/.m2/repository/org/apache/hive/hive-exec/2.0.0/hive-exec-2.0.0.jar
>> > --remote-host MacBook-Pro.local --remote-port 51331 --conf
>> > hive.spark.client.connect.timeout=1000 --conf
>> > hive.spark.client.server.connect.timeout=90000 --conf
>> > hive.spark.client.channel.log.level=null --conf
>> > hive.spark.client.rpc.max.size=52428800 --conf
>> > hive.spark.client.rpc.threads=8 --conf
>> > hive.spark.client.secret.bits=256
>> > 16/03/10 18:33:09 INFO SparkClientImpl: 16/03/10 18:33:09 INFO Client:
>> > 16/03/10 18:33:09 INFO SparkClientImpl:          client token: N/A
>> > 16/03/10 18:33:09 INFO SparkClientImpl:          diagnostics: N/A
>> > 16/03/10 18:33:09 INFO SparkClientImpl:          ApplicationMaster host:
>> > N/A
>> > 16/03/10 18:33:09 INFO SparkClientImpl:          ApplicationMaster RPC
>> > port: -1
>> > 16/03/10 18:33:09 INFO SparkClientImpl:          queue: default
>> > 16/03/10 18:33:09 INFO SparkClientImpl:          start time:
>> 1457180833494
>> > 16/03/10 18:33:09 INFO SparkClientImpl:          final status: UNDEFINED
>> > 16/03/10 18:33:09 INFO SparkClientImpl:          tracking URL:
>> > http://storm0:8088/proxy/application_1457002628102_0043/
>> > 16/03/10 18:33:09 INFO SparkClientImpl:          user: stana
>> > 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO Client:
>> > Application report for application_1457002628102_0043 (state: FAILED)
>> > 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO Client:
>> > 16/03/10 18:33:10 INFO SparkClientImpl:          client token: N/A
>> > 16/03/10 18:33:10 INFO SparkClientImpl:          diagnostics:
>> Application
>> > application_1457002628102_0043 failed 1 times due to AM Container for
>> > appattempt_1457002628102_0043_000001 exited with  exitCode: -1000
>> > 16/03/10 18:33:10 INFO SparkClientImpl: For more detailed output,
>> > check application tracking
>> > page:http://storm0:8088/proxy/application_1457002628102_0043/Then,
>> > click on links to logs of each attempt.
>> > 16/03/10 18:33:10 INFO SparkClientImpl: Diagnostics:
>> > java.io.FileNotFoundException: File
>> >
>> >
>> file:/Users/stana/.m2/repository/org/apache/hive/hive-exec/2.0.0/hive-exec-2.0.0.jar
>> > does not exist
>> > 16/03/10 18:33:10 INFO SparkClientImpl: Failing this attempt. Failing
>> > the application.
>> > 16/03/10 18:33:10 INFO SparkClientImpl:          ApplicationMaster host:
>> > N/A
>> > 16/03/10 18:33:10 INFO SparkClientImpl:          ApplicationMaster RPC
>> > port: -1
>> > 16/03/10 18:33:10 INFO SparkClientImpl:          queue: default
>> > 16/03/10 18:33:10 INFO SparkClientImpl:          start time:
>> 1457180833494
>> > 16/03/10 18:33:10 INFO SparkClientImpl:          final status: FAILED
>> > 16/03/10 18:33:10 INFO SparkClientImpl:          tracking URL:
>> > http://storm0:8088/cluster/app/application_1457002628102_0043
>> > 16/03/10 18:33:10 INFO SparkClientImpl:          user: stana
>> > 16/03/10 18:33:10 INFO SparkClientImpl: Exception in thread "main"
>> > org.apache.spark.SparkException: Application
>> > application_1457002628102_0043 finished with failed status
>> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
>> > org.apache.spark.deploy.yarn.Client.run(Client.scala:920)
>> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
>> > org.apache.spark.deploy.yarn.Client$.main(Client.scala:966)
>> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
>> > org.apache.spark.deploy.yarn.Client.main(Client.scala)
>> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
>> > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
>> >
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
>> >
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
>> > java.lang.reflect.Method.invoke(Method.java:606)
>> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
>> >
>> >
>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
>> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
>> > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
>> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
>> > org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
>> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
>> > org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
>> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
>> > org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>> > 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO
>> > ShutdownHookManager: Shutdown hook called
>> > 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO
>> > ShutdownHookManager: Deleting directory
>> >
>> >
>> /private/var/folders/vt/cjcdhms903x7brn1kbh558s40000gn/T/spark-5b92ce20-b6f8-4832-8b15-5e98bd0e0705
>> > 16/03/10 18:33:10 WARN SparkClientImpl: Error while waiting for client
>> > to connect.
>> > java.util.concurrent.ExecutionException: java.lang.RuntimeException:
>> > Cancel client '5bda93c0-865b-48a8-b368-c2fcc30e81e8'. Error: Child
>> > process exited before connecting back
>> >         at
>> > io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
>> > ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>> >         at
>> >
>> org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:101)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:98)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:94)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:63)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:55)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:114)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:131)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:117)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runJoinOptimizations(SparkCompiler.java:181)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:119)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10195)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:229)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:239)
>> > [hive-exec-2.0.0.jar:2.0.0]
>> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:479)
>> > [hive-exec-2.0.0.jar:?]
>> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:319)
>> > [hive-exec-2.0.0.jar:?]
>> >         at
>> > org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1255)
>> > [hive-exec-2.0.0.jar:?]
>> >         at
>> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1301)
>> > [hive-exec-2.0.0.jar:?]
>> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1184)
>> > [hive-exec-2.0.0.jar:?]
>> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1172)
>> > [hive-exec-2.0.0.jar:?]
>> >         at
>> > org.apache.hadoop.hive.ql.TestHiveDriver.main(TestHiveDriver.java:41)
>> > [test-classes/:?]
>> > Caused by: java.lang.RuntimeException: Cancel client
>> > '5bda93c0-865b-48a8-b368-c2fcc30e81e8'. Error: Child process exited
>> > before connecting back
>> >         at
>> >
>> org.apache.hive.spark.client.rpc.RpcServer.cancelClient(RpcServer.java:179)
>> > ~[hive-exec-2.0.0.jar:2.0.0]
>> >         at
>> >
>> org.apache.hive.spark.client.SparkClientImpl$3.run(SparkClientImpl.java:450)
>> > ~[hive-exec-2.0.0.jar:2.0.0]
>> >         at java.lang.Thread.run(Thread.java:745) ~[?:1.7.0_67]
>> > 16/03/10 18:33:10 WARN SparkClientImpl: Child process exited with code
>> 1.
>> > FAILED: SemanticException Failed to get a spark session:
>> > org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create
>> > spark client.
>> > 16/03/10 18:33:10 ERROR Driver: FAILED: SemanticException Failed to
>> > get a spark session: org.apache.hadoop.hive.ql.metadata.HiveException:
>> > Failed to create spark client.
>> > org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get a
>> > spark session: org.apache.hadoop.hive.ql.metadata.HiveException:
>> > Failed to create spark client.
>> >         at
>> >
>> org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:121)
>> >         at
>> >
>> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>> >         at
>> >
>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>> >         at
>> >
>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>> >         at
>> >
>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
>> >         at
>> >
>> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
>> >         at
>> >
>> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runJoinOptimizations(SparkCompiler.java:181)
>> >         at
>> >
>> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:119)
>> >         at
>> >
>> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102)
>> >         at
>> >
>> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10195)
>> >         at
>> >
>> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:229)
>> >         at
>> >
>> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:239)
>> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:479)
>> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:319)
>> >         at
>> > org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1255)
>> >         at
>> org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1301)
>> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1184)
>> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1172)
>> >         at
>> > org.apache.hadoop.hive.ql.TestHiveDriver.main(TestHiveDriver.java:41)
>> >
>>
>
>

Re: Error in Hive on Spark

Posted by Stana <st...@is-land.com.tw>.
Thanks for reply

I have set the property spark.home in my application. Otherwise the
application threw 'SPARK_HOME not found exception'.

I found hive source code in SparkClientImpl.java:

private Thread startDriver(final RpcServer rpcServer, final String
clientId, final String secret)
      throws IOException {
...

List<String> argv = Lists.newArrayList();

...

argv.add("--class");
argv.add(RemoteDriver.class.getName());

String jar = "spark-internal";
if (SparkContext.jarOfClass(this.getClass()).isDefined()) {
jar = SparkContext.jarOfClass(this.getClass()).get();
}
argv.add(jar);

...

}

When hive executed spark-submit , it generate the shell command with
--class org.apache.hive.spark.client.RemoteDriver ,and set jar path with
SparkContext.jarOfClass(this.getClass()).get(). It will get the local path
of hive-exec-2.0.0.jar.

In my situation, the application and yarn cluster are in different cluster.
When application executed spark-submit with local path of
hive-exec-2.0.0.jar to yarn cluster, there 's no hive-exec-2.0.0.jar in
yarn cluster. Then application threw the exception: "hive-exec-2.0.0.jar
  does not exist ...".

Can it be set property of hive-exec-2.0.0.jar path in application ?
Something like 'hiveConf.set("hive.remote.driver.jar",
"hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar")'.
If not, is it possible to achieve in the future version?



2016-03-10 23:51 GMT+08:00 Xuefu Zhang <xu...@uber.com>:

> You can probably avoid the problem by set environment variable SPARK_HOME
> or JVM property spark.home that points to your spark installation.
>
> --Xuefu
>
> On Thu, Mar 10, 2016 at 3:11 AM, Stana <st...@is-land.com.tw> wrote:
>
> >  I am trying out Hive on Spark with hive 2.0.0 and spark 1.4.1, and
> > executing org.apache.hadoop.hive.ql.Driver with java application.
> >
> > Following are my situations:
> > 1.Building spark 1.4.1 assembly jar without Hive .
> > 2.Uploading the spark assembly jar to the hadoop cluster.
> > 3.Executing the java application with eclipse IDE in my client computer.
> >
> > The application went well and it submitted mr job to the yarn cluster
> > successfully when using " hiveConf.set("hive.execution.engine", "mr")
> > ",but it threw exceptions in spark-engine.
> >
> > Finally, i traced Hive source code and came to the conclusion:
> >
> > In my situation, SparkClientImpl class will generate the spark-submit
> > shell and executed it.
> > The shell command allocated  --class with RemoteDriver.class.getName()
> > and jar with SparkContext.jarOfClass(this.getClass()).get(), so that
> > my application threw the exception.
> >
> > Is it right? And how can I do to execute the application with
> > spark-engine successfully in my client computer ? Thanks a lot!
> >
> >
> > Java application code:
> >
> > public class TestHiveDriver {
> >
> >         private static HiveConf hiveConf;
> >         private static Driver driver;
> >         private static CliSessionState ss;
> >         public static void main(String[] args){
> >
> >                 String sql = "select * from hadoop0263_0 as a join
> > hadoop0263_0 as b
> > on (a.key = b.key)";
> >                 ss = new CliSessionState(new
> HiveConf(SessionState.class));
> >                 hiveConf = new HiveConf(Driver.class);
> >                 hiveConf.set("fs.default.name", "hdfs://storm0:9000");
> >                 hiveConf.set("yarn.resourcemanager.address",
> > "storm0:8032");
> >                 hiveConf.set("yarn.resourcemanager.scheduler.address",
> > "storm0:8030");
> >
> >
> hiveConf.set("yarn.resourcemanager.resource-tracker.address","storm0:8031");
> >                 hiveConf.set("yarn.resourcemanager.admin.address",
> > "storm0:8033");
> >                 hiveConf.set("mapreduce.framework.name", "yarn");
> >                 hiveConf.set("mapreduce.johistory.address",
> > "storm0:10020");
> >
> >
> hiveConf.set("javax.jdo.option.ConnectionURL","jdbc:mysql://storm0:3306/stana_metastore");
> >
> >
> hiveConf.set("javax.jdo.option.ConnectionDriverName","com.mysql.jdbc.Driver");
> >                 hiveConf.set("javax.jdo.option.ConnectionUserName",
> > "root");
> >                 hiveConf.set("javax.jdo.option.ConnectionPassword",
> > "123456");
> >                 hiveConf.setBoolean("hive.auto.convert.join",false);
> >                 hiveConf.set("spark.yarn.jar",
> > "hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar");
> >                 hiveConf.set("spark.home","target/spark");
> >                 hiveConf.set("hive.execution.engine", "spark");
> >                 hiveConf.set("hive.dbname", "default");
> >
> >
> >                 driver = new Driver(hiveConf);
> >                 SessionState.start(hiveConf);
> >
> >                 CommandProcessorResponse res = null;
> >                 try {
> >                         res = driver.run(sql);
> >                 } catch (CommandNeedRetryException e) {
> >                         // TODO Auto-generated catch block
> >                         e.printStackTrace();
> >                 }
> >
> >                 System.out.println("Response Code:" +
> > res.getResponseCode());
> >                 System.out.println("Error Message:" +
> > res.getErrorMessage());
> >                 System.out.println("SQL State:" + res.getSQLState());
> >
> >         }
> > }
> >
> >
> >
> >
> > Exception of spark-engine:
> >
> > 16/03/10 18:32:58 INFO SparkClientImpl: Running client driver with
> > argv:
> >
> /Volumes/Sdhd/Documents/project/island/java/apache/hive-200-test/hive-release-2.0.0/itests/hive-unit/target/spark/bin/spark-submit
> > --properties-file
> >
> >
> /var/folders/vt/cjcdhms903x7brn1kbh558s40000gn/T/spark-submit.7697089826296920539.properties
> > --class org.apache.hive.spark.client.RemoteDriver
> >
> >
> /Users/stana/.m2/repository/org/apache/hive/hive-exec/2.0.0/hive-exec-2.0.0.jar
> > --remote-host MacBook-Pro.local --remote-port 51331 --conf
> > hive.spark.client.connect.timeout=1000 --conf
> > hive.spark.client.server.connect.timeout=90000 --conf
> > hive.spark.client.channel.log.level=null --conf
> > hive.spark.client.rpc.max.size=52428800 --conf
> > hive.spark.client.rpc.threads=8 --conf
> > hive.spark.client.secret.bits=256
> > 16/03/10 18:33:09 INFO SparkClientImpl: 16/03/10 18:33:09 INFO Client:
> > 16/03/10 18:33:09 INFO SparkClientImpl:          client token: N/A
> > 16/03/10 18:33:09 INFO SparkClientImpl:          diagnostics: N/A
> > 16/03/10 18:33:09 INFO SparkClientImpl:          ApplicationMaster host:
> > N/A
> > 16/03/10 18:33:09 INFO SparkClientImpl:          ApplicationMaster RPC
> > port: -1
> > 16/03/10 18:33:09 INFO SparkClientImpl:          queue: default
> > 16/03/10 18:33:09 INFO SparkClientImpl:          start time:
> 1457180833494
> > 16/03/10 18:33:09 INFO SparkClientImpl:          final status: UNDEFINED
> > 16/03/10 18:33:09 INFO SparkClientImpl:          tracking URL:
> > http://storm0:8088/proxy/application_1457002628102_0043/
> > 16/03/10 18:33:09 INFO SparkClientImpl:          user: stana
> > 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO Client:
> > Application report for application_1457002628102_0043 (state: FAILED)
> > 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO Client:
> > 16/03/10 18:33:10 INFO SparkClientImpl:          client token: N/A
> > 16/03/10 18:33:10 INFO SparkClientImpl:          diagnostics: Application
> > application_1457002628102_0043 failed 1 times due to AM Container for
> > appattempt_1457002628102_0043_000001 exited with  exitCode: -1000
> > 16/03/10 18:33:10 INFO SparkClientImpl: For more detailed output,
> > check application tracking
> > page:http://storm0:8088/proxy/application_1457002628102_0043/Then,
> > click on links to logs of each attempt.
> > 16/03/10 18:33:10 INFO SparkClientImpl: Diagnostics:
> > java.io.FileNotFoundException: File
> >
> >
> file:/Users/stana/.m2/repository/org/apache/hive/hive-exec/2.0.0/hive-exec-2.0.0.jar
> > does not exist
> > 16/03/10 18:33:10 INFO SparkClientImpl: Failing this attempt. Failing
> > the application.
> > 16/03/10 18:33:10 INFO SparkClientImpl:          ApplicationMaster host:
> > N/A
> > 16/03/10 18:33:10 INFO SparkClientImpl:          ApplicationMaster RPC
> > port: -1
> > 16/03/10 18:33:10 INFO SparkClientImpl:          queue: default
> > 16/03/10 18:33:10 INFO SparkClientImpl:          start time:
> 1457180833494
> > 16/03/10 18:33:10 INFO SparkClientImpl:          final status: FAILED
> > 16/03/10 18:33:10 INFO SparkClientImpl:          tracking URL:
> > http://storm0:8088/cluster/app/application_1457002628102_0043
> > 16/03/10 18:33:10 INFO SparkClientImpl:          user: stana
> > 16/03/10 18:33:10 INFO SparkClientImpl: Exception in thread "main"
> > org.apache.spark.SparkException: Application
> > application_1457002628102_0043 finished with failed status
> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > org.apache.spark.deploy.yarn.Client.run(Client.scala:920)
> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > org.apache.spark.deploy.yarn.Client$.main(Client.scala:966)
> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > org.apache.spark.deploy.yarn.Client.main(Client.scala)
> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > java.lang.reflect.Method.invoke(Method.java:606)
> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> >
> >
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> > 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO
> > ShutdownHookManager: Shutdown hook called
> > 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO
> > ShutdownHookManager: Deleting directory
> >
> >
> /private/var/folders/vt/cjcdhms903x7brn1kbh558s40000gn/T/spark-5b92ce20-b6f8-4832-8b15-5e98bd0e0705
> > 16/03/10 18:33:10 WARN SparkClientImpl: Error while waiting for client
> > to connect.
> > java.util.concurrent.ExecutionException: java.lang.RuntimeException:
> > Cancel client '5bda93c0-865b-48a8-b368-c2fcc30e81e8'. Error: Child
> > process exited before connecting back
> >         at
> > io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
> > ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> >         at
> >
> org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:101)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:98)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:94)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:63)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:55)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:114)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:131)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:117)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runJoinOptimizations(SparkCompiler.java:181)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:119)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10195)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:229)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:239)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:479)
> > [hive-exec-2.0.0.jar:?]
> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:319)
> > [hive-exec-2.0.0.jar:?]
> >         at
> > org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1255)
> > [hive-exec-2.0.0.jar:?]
> >         at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1301)
> > [hive-exec-2.0.0.jar:?]
> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1184)
> > [hive-exec-2.0.0.jar:?]
> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1172)
> > [hive-exec-2.0.0.jar:?]
> >         at
> > org.apache.hadoop.hive.ql.TestHiveDriver.main(TestHiveDriver.java:41)
> > [test-classes/:?]
> > Caused by: java.lang.RuntimeException: Cancel client
> > '5bda93c0-865b-48a8-b368-c2fcc30e81e8'. Error: Child process exited
> > before connecting back
> >         at
> >
> org.apache.hive.spark.client.rpc.RpcServer.cancelClient(RpcServer.java:179)
> > ~[hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hive.spark.client.SparkClientImpl$3.run(SparkClientImpl.java:450)
> > ~[hive-exec-2.0.0.jar:2.0.0]
> >         at java.lang.Thread.run(Thread.java:745) ~[?:1.7.0_67]
> > 16/03/10 18:33:10 WARN SparkClientImpl: Child process exited with code 1.
> > FAILED: SemanticException Failed to get a spark session:
> > org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create
> > spark client.
> > 16/03/10 18:33:10 ERROR Driver: FAILED: SemanticException Failed to
> > get a spark session: org.apache.hadoop.hive.ql.metadata.HiveException:
> > Failed to create spark client.
> > org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get a
> > spark session: org.apache.hadoop.hive.ql.metadata.HiveException:
> > Failed to create spark client.
> >         at
> >
> org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:121)
> >         at
> >
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> >         at
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> >         at
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
> >         at
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
> >         at
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
> >         at
> >
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runJoinOptimizations(SparkCompiler.java:181)
> >         at
> >
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:119)
> >         at
> >
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102)
> >         at
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10195)
> >         at
> >
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:229)
> >         at
> >
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:239)
> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:479)
> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:319)
> >         at
> > org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1255)
> >         at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1301)
> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1184)
> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1172)
> >         at
> > org.apache.hadoop.hive.ql.TestHiveDriver.main(TestHiveDriver.java:41)
> >
>

Re: Error in Hive on Spark

Posted by Stana <st...@is-land.com.tw>.
Thanks for reply

I have set the property spark.home in my application. Otherwise the
application threw 'SPARK_HOME not found exception'.

I found hive source code in SparkClientImpl.java:

private Thread startDriver(final RpcServer rpcServer, final String
clientId, final String secret)
      throws IOException {
...

List<String> argv = Lists.newArrayList();

...

argv.add("--class");
argv.add(RemoteDriver.class.getName());

String jar = "spark-internal";
if (SparkContext.jarOfClass(this.getClass()).isDefined()) {
jar = SparkContext.jarOfClass(this.getClass()).get();
}
argv.add(jar);

...

}

When hive executed spark-submit , it generate the shell command with
--class org.apache.hive.spark.client.RemoteDriver ,and set jar path with
SparkContext.jarOfClass(this.getClass()).get(). It will get the local path
of hive-exec-2.0.0.jar.

In my situation, the application and yarn cluster are in different cluster.
When application executed spark-submit with local path of
hive-exec-2.0.0.jar to yarn cluster, there 's no hive-exec-2.0.0.jar in
yarn cluster. Then application threw the exception: "hive-exec-2.0.0.jar
  does not exist ...".

Can it be set property of hive-exec-2.0.0.jar path in application ?
Something like 'hiveConf.set("hive.remote.driver.jar",
"hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar")'.
If not, is it possible to achieve in the future version?



2016-03-10 23:51 GMT+08:00 Xuefu Zhang <xu...@uber.com>:

> You can probably avoid the problem by set environment variable SPARK_HOME
> or JVM property spark.home that points to your spark installation.
>
> --Xuefu
>
> On Thu, Mar 10, 2016 at 3:11 AM, Stana <st...@is-land.com.tw> wrote:
>
> >  I am trying out Hive on Spark with hive 2.0.0 and spark 1.4.1, and
> > executing org.apache.hadoop.hive.ql.Driver with java application.
> >
> > Following are my situations:
> > 1.Building spark 1.4.1 assembly jar without Hive .
> > 2.Uploading the spark assembly jar to the hadoop cluster.
> > 3.Executing the java application with eclipse IDE in my client computer.
> >
> > The application went well and it submitted mr job to the yarn cluster
> > successfully when using " hiveConf.set("hive.execution.engine", "mr")
> > ",but it threw exceptions in spark-engine.
> >
> > Finally, i traced Hive source code and came to the conclusion:
> >
> > In my situation, SparkClientImpl class will generate the spark-submit
> > shell and executed it.
> > The shell command allocated  --class with RemoteDriver.class.getName()
> > and jar with SparkContext.jarOfClass(this.getClass()).get(), so that
> > my application threw the exception.
> >
> > Is it right? And how can I do to execute the application with
> > spark-engine successfully in my client computer ? Thanks a lot!
> >
> >
> > Java application code:
> >
> > public class TestHiveDriver {
> >
> >         private static HiveConf hiveConf;
> >         private static Driver driver;
> >         private static CliSessionState ss;
> >         public static void main(String[] args){
> >
> >                 String sql = "select * from hadoop0263_0 as a join
> > hadoop0263_0 as b
> > on (a.key = b.key)";
> >                 ss = new CliSessionState(new
> HiveConf(SessionState.class));
> >                 hiveConf = new HiveConf(Driver.class);
> >                 hiveConf.set("fs.default.name", "hdfs://storm0:9000");
> >                 hiveConf.set("yarn.resourcemanager.address",
> > "storm0:8032");
> >                 hiveConf.set("yarn.resourcemanager.scheduler.address",
> > "storm0:8030");
> >
> >
> hiveConf.set("yarn.resourcemanager.resource-tracker.address","storm0:8031");
> >                 hiveConf.set("yarn.resourcemanager.admin.address",
> > "storm0:8033");
> >                 hiveConf.set("mapreduce.framework.name", "yarn");
> >                 hiveConf.set("mapreduce.johistory.address",
> > "storm0:10020");
> >
> >
> hiveConf.set("javax.jdo.option.ConnectionURL","jdbc:mysql://storm0:3306/stana_metastore");
> >
> >
> hiveConf.set("javax.jdo.option.ConnectionDriverName","com.mysql.jdbc.Driver");
> >                 hiveConf.set("javax.jdo.option.ConnectionUserName",
> > "root");
> >                 hiveConf.set("javax.jdo.option.ConnectionPassword",
> > "123456");
> >                 hiveConf.setBoolean("hive.auto.convert.join",false);
> >                 hiveConf.set("spark.yarn.jar",
> > "hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar");
> >                 hiveConf.set("spark.home","target/spark");
> >                 hiveConf.set("hive.execution.engine", "spark");
> >                 hiveConf.set("hive.dbname", "default");
> >
> >
> >                 driver = new Driver(hiveConf);
> >                 SessionState.start(hiveConf);
> >
> >                 CommandProcessorResponse res = null;
> >                 try {
> >                         res = driver.run(sql);
> >                 } catch (CommandNeedRetryException e) {
> >                         // TODO Auto-generated catch block
> >                         e.printStackTrace();
> >                 }
> >
> >                 System.out.println("Response Code:" +
> > res.getResponseCode());
> >                 System.out.println("Error Message:" +
> > res.getErrorMessage());
> >                 System.out.println("SQL State:" + res.getSQLState());
> >
> >         }
> > }
> >
> >
> >
> >
> > Exception of spark-engine:
> >
> > 16/03/10 18:32:58 INFO SparkClientImpl: Running client driver with
> > argv:
> >
> /Volumes/Sdhd/Documents/project/island/java/apache/hive-200-test/hive-release-2.0.0/itests/hive-unit/target/spark/bin/spark-submit
> > --properties-file
> >
> >
> /var/folders/vt/cjcdhms903x7brn1kbh558s40000gn/T/spark-submit.7697089826296920539.properties
> > --class org.apache.hive.spark.client.RemoteDriver
> >
> >
> /Users/stana/.m2/repository/org/apache/hive/hive-exec/2.0.0/hive-exec-2.0.0.jar
> > --remote-host MacBook-Pro.local --remote-port 51331 --conf
> > hive.spark.client.connect.timeout=1000 --conf
> > hive.spark.client.server.connect.timeout=90000 --conf
> > hive.spark.client.channel.log.level=null --conf
> > hive.spark.client.rpc.max.size=52428800 --conf
> > hive.spark.client.rpc.threads=8 --conf
> > hive.spark.client.secret.bits=256
> > 16/03/10 18:33:09 INFO SparkClientImpl: 16/03/10 18:33:09 INFO Client:
> > 16/03/10 18:33:09 INFO SparkClientImpl:          client token: N/A
> > 16/03/10 18:33:09 INFO SparkClientImpl:          diagnostics: N/A
> > 16/03/10 18:33:09 INFO SparkClientImpl:          ApplicationMaster host:
> > N/A
> > 16/03/10 18:33:09 INFO SparkClientImpl:          ApplicationMaster RPC
> > port: -1
> > 16/03/10 18:33:09 INFO SparkClientImpl:          queue: default
> > 16/03/10 18:33:09 INFO SparkClientImpl:          start time:
> 1457180833494
> > 16/03/10 18:33:09 INFO SparkClientImpl:          final status: UNDEFINED
> > 16/03/10 18:33:09 INFO SparkClientImpl:          tracking URL:
> > http://storm0:8088/proxy/application_1457002628102_0043/
> > 16/03/10 18:33:09 INFO SparkClientImpl:          user: stana
> > 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO Client:
> > Application report for application_1457002628102_0043 (state: FAILED)
> > 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO Client:
> > 16/03/10 18:33:10 INFO SparkClientImpl:          client token: N/A
> > 16/03/10 18:33:10 INFO SparkClientImpl:          diagnostics: Application
> > application_1457002628102_0043 failed 1 times due to AM Container for
> > appattempt_1457002628102_0043_000001 exited with  exitCode: -1000
> > 16/03/10 18:33:10 INFO SparkClientImpl: For more detailed output,
> > check application tracking
> > page:http://storm0:8088/proxy/application_1457002628102_0043/Then,
> > click on links to logs of each attempt.
> > 16/03/10 18:33:10 INFO SparkClientImpl: Diagnostics:
> > java.io.FileNotFoundException: File
> >
> >
> file:/Users/stana/.m2/repository/org/apache/hive/hive-exec/2.0.0/hive-exec-2.0.0.jar
> > does not exist
> > 16/03/10 18:33:10 INFO SparkClientImpl: Failing this attempt. Failing
> > the application.
> > 16/03/10 18:33:10 INFO SparkClientImpl:          ApplicationMaster host:
> > N/A
> > 16/03/10 18:33:10 INFO SparkClientImpl:          ApplicationMaster RPC
> > port: -1
> > 16/03/10 18:33:10 INFO SparkClientImpl:          queue: default
> > 16/03/10 18:33:10 INFO SparkClientImpl:          start time:
> 1457180833494
> > 16/03/10 18:33:10 INFO SparkClientImpl:          final status: FAILED
> > 16/03/10 18:33:10 INFO SparkClientImpl:          tracking URL:
> > http://storm0:8088/cluster/app/application_1457002628102_0043
> > 16/03/10 18:33:10 INFO SparkClientImpl:          user: stana
> > 16/03/10 18:33:10 INFO SparkClientImpl: Exception in thread "main"
> > org.apache.spark.SparkException: Application
> > application_1457002628102_0043 finished with failed status
> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > org.apache.spark.deploy.yarn.Client.run(Client.scala:920)
> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > org.apache.spark.deploy.yarn.Client$.main(Client.scala:966)
> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > org.apache.spark.deploy.yarn.Client.main(Client.scala)
> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > java.lang.reflect.Method.invoke(Method.java:606)
> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> >
> >
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
> > 16/03/10 18:33:10 INFO SparkClientImpl:         at
> > org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> > 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO
> > ShutdownHookManager: Shutdown hook called
> > 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO
> > ShutdownHookManager: Deleting directory
> >
> >
> /private/var/folders/vt/cjcdhms903x7brn1kbh558s40000gn/T/spark-5b92ce20-b6f8-4832-8b15-5e98bd0e0705
> > 16/03/10 18:33:10 WARN SparkClientImpl: Error while waiting for client
> > to connect.
> > java.util.concurrent.ExecutionException: java.lang.RuntimeException:
> > Cancel client '5bda93c0-865b-48a8-b368-c2fcc30e81e8'. Error: Child
> > process exited before connecting back
> >         at
> > io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
> > ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> >         at
> >
> org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:101)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:98)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:94)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:63)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:55)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:114)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:131)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:117)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runJoinOptimizations(SparkCompiler.java:181)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:119)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10195)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:229)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:239)
> > [hive-exec-2.0.0.jar:2.0.0]
> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:479)
> > [hive-exec-2.0.0.jar:?]
> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:319)
> > [hive-exec-2.0.0.jar:?]
> >         at
> > org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1255)
> > [hive-exec-2.0.0.jar:?]
> >         at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1301)
> > [hive-exec-2.0.0.jar:?]
> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1184)
> > [hive-exec-2.0.0.jar:?]
> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1172)
> > [hive-exec-2.0.0.jar:?]
> >         at
> > org.apache.hadoop.hive.ql.TestHiveDriver.main(TestHiveDriver.java:41)
> > [test-classes/:?]
> > Caused by: java.lang.RuntimeException: Cancel client
> > '5bda93c0-865b-48a8-b368-c2fcc30e81e8'. Error: Child process exited
> > before connecting back
> >         at
> >
> org.apache.hive.spark.client.rpc.RpcServer.cancelClient(RpcServer.java:179)
> > ~[hive-exec-2.0.0.jar:2.0.0]
> >         at
> >
> org.apache.hive.spark.client.SparkClientImpl$3.run(SparkClientImpl.java:450)
> > ~[hive-exec-2.0.0.jar:2.0.0]
> >         at java.lang.Thread.run(Thread.java:745) ~[?:1.7.0_67]
> > 16/03/10 18:33:10 WARN SparkClientImpl: Child process exited with code 1.
> > FAILED: SemanticException Failed to get a spark session:
> > org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create
> > spark client.
> > 16/03/10 18:33:10 ERROR Driver: FAILED: SemanticException Failed to
> > get a spark session: org.apache.hadoop.hive.ql.metadata.HiveException:
> > Failed to create spark client.
> > org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get a
> > spark session: org.apache.hadoop.hive.ql.metadata.HiveException:
> > Failed to create spark client.
> >         at
> >
> org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:121)
> >         at
> >
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> >         at
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> >         at
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
> >         at
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
> >         at
> >
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
> >         at
> >
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runJoinOptimizations(SparkCompiler.java:181)
> >         at
> >
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:119)
> >         at
> >
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102)
> >         at
> >
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10195)
> >         at
> >
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:229)
> >         at
> >
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:239)
> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:479)
> >         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:319)
> >         at
> > org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1255)
> >         at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1301)
> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1184)
> >         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1172)
> >         at
> > org.apache.hadoop.hive.ql.TestHiveDriver.main(TestHiveDriver.java:41)
> >
>

Re: Error in Hive on Spark

Posted by Xuefu Zhang <xu...@uber.com>.
You can probably avoid the problem by set environment variable SPARK_HOME
or JVM property spark.home that points to your spark installation.

--Xuefu

On Thu, Mar 10, 2016 at 3:11 AM, Stana <st...@is-land.com.tw> wrote:

>  I am trying out Hive on Spark with hive 2.0.0 and spark 1.4.1, and
> executing org.apache.hadoop.hive.ql.Driver with java application.
>
> Following are my situations:
> 1.Building spark 1.4.1 assembly jar without Hive .
> 2.Uploading the spark assembly jar to the hadoop cluster.
> 3.Executing the java application with eclipse IDE in my client computer.
>
> The application went well and it submitted mr job to the yarn cluster
> successfully when using " hiveConf.set("hive.execution.engine", "mr")
> ",but it threw exceptions in spark-engine.
>
> Finally, i traced Hive source code and came to the conclusion:
>
> In my situation, SparkClientImpl class will generate the spark-submit
> shell and executed it.
> The shell command allocated  --class with RemoteDriver.class.getName()
> and jar with SparkContext.jarOfClass(this.getClass()).get(), so that
> my application threw the exception.
>
> Is it right? And how can I do to execute the application with
> spark-engine successfully in my client computer ? Thanks a lot!
>
>
> Java application code:
>
> public class TestHiveDriver {
>
>         private static HiveConf hiveConf;
>         private static Driver driver;
>         private static CliSessionState ss;
>         public static void main(String[] args){
>
>                 String sql = "select * from hadoop0263_0 as a join
> hadoop0263_0 as b
> on (a.key = b.key)";
>                 ss = new CliSessionState(new HiveConf(SessionState.class));
>                 hiveConf = new HiveConf(Driver.class);
>                 hiveConf.set("fs.default.name", "hdfs://storm0:9000");
>                 hiveConf.set("yarn.resourcemanager.address",
> "storm0:8032");
>                 hiveConf.set("yarn.resourcemanager.scheduler.address",
> "storm0:8030");
>
> hiveConf.set("yarn.resourcemanager.resource-tracker.address","storm0:8031");
>                 hiveConf.set("yarn.resourcemanager.admin.address",
> "storm0:8033");
>                 hiveConf.set("mapreduce.framework.name", "yarn");
>                 hiveConf.set("mapreduce.johistory.address",
> "storm0:10020");
>
> hiveConf.set("javax.jdo.option.ConnectionURL","jdbc:mysql://storm0:3306/stana_metastore");
>
> hiveConf.set("javax.jdo.option.ConnectionDriverName","com.mysql.jdbc.Driver");
>                 hiveConf.set("javax.jdo.option.ConnectionUserName",
> "root");
>                 hiveConf.set("javax.jdo.option.ConnectionPassword",
> "123456");
>                 hiveConf.setBoolean("hive.auto.convert.join",false);
>                 hiveConf.set("spark.yarn.jar",
> "hdfs://storm0:9000/tmp/spark-assembly-1.4.1-hadoop2.6.0.jar");
>                 hiveConf.set("spark.home","target/spark");
>                 hiveConf.set("hive.execution.engine", "spark");
>                 hiveConf.set("hive.dbname", "default");
>
>
>                 driver = new Driver(hiveConf);
>                 SessionState.start(hiveConf);
>
>                 CommandProcessorResponse res = null;
>                 try {
>                         res = driver.run(sql);
>                 } catch (CommandNeedRetryException e) {
>                         // TODO Auto-generated catch block
>                         e.printStackTrace();
>                 }
>
>                 System.out.println("Response Code:" +
> res.getResponseCode());
>                 System.out.println("Error Message:" +
> res.getErrorMessage());
>                 System.out.println("SQL State:" + res.getSQLState());
>
>         }
> }
>
>
>
>
> Exception of spark-engine:
>
> 16/03/10 18:32:58 INFO SparkClientImpl: Running client driver with
> argv:
> /Volumes/Sdhd/Documents/project/island/java/apache/hive-200-test/hive-release-2.0.0/itests/hive-unit/target/spark/bin/spark-submit
> --properties-file
>
> /var/folders/vt/cjcdhms903x7brn1kbh558s40000gn/T/spark-submit.7697089826296920539.properties
> --class org.apache.hive.spark.client.RemoteDriver
>
> /Users/stana/.m2/repository/org/apache/hive/hive-exec/2.0.0/hive-exec-2.0.0.jar
> --remote-host MacBook-Pro.local --remote-port 51331 --conf
> hive.spark.client.connect.timeout=1000 --conf
> hive.spark.client.server.connect.timeout=90000 --conf
> hive.spark.client.channel.log.level=null --conf
> hive.spark.client.rpc.max.size=52428800 --conf
> hive.spark.client.rpc.threads=8 --conf
> hive.spark.client.secret.bits=256
> 16/03/10 18:33:09 INFO SparkClientImpl: 16/03/10 18:33:09 INFO Client:
> 16/03/10 18:33:09 INFO SparkClientImpl:          client token: N/A
> 16/03/10 18:33:09 INFO SparkClientImpl:          diagnostics: N/A
> 16/03/10 18:33:09 INFO SparkClientImpl:          ApplicationMaster host:
> N/A
> 16/03/10 18:33:09 INFO SparkClientImpl:          ApplicationMaster RPC
> port: -1
> 16/03/10 18:33:09 INFO SparkClientImpl:          queue: default
> 16/03/10 18:33:09 INFO SparkClientImpl:          start time: 1457180833494
> 16/03/10 18:33:09 INFO SparkClientImpl:          final status: UNDEFINED
> 16/03/10 18:33:09 INFO SparkClientImpl:          tracking URL:
> http://storm0:8088/proxy/application_1457002628102_0043/
> 16/03/10 18:33:09 INFO SparkClientImpl:          user: stana
> 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO Client:
> Application report for application_1457002628102_0043 (state: FAILED)
> 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO Client:
> 16/03/10 18:33:10 INFO SparkClientImpl:          client token: N/A
> 16/03/10 18:33:10 INFO SparkClientImpl:          diagnostics: Application
> application_1457002628102_0043 failed 1 times due to AM Container for
> appattempt_1457002628102_0043_000001 exited with  exitCode: -1000
> 16/03/10 18:33:10 INFO SparkClientImpl: For more detailed output,
> check application tracking
> page:http://storm0:8088/proxy/application_1457002628102_0043/Then,
> click on links to logs of each attempt.
> 16/03/10 18:33:10 INFO SparkClientImpl: Diagnostics:
> java.io.FileNotFoundException: File
>
> file:/Users/stana/.m2/repository/org/apache/hive/hive-exec/2.0.0/hive-exec-2.0.0.jar
> does not exist
> 16/03/10 18:33:10 INFO SparkClientImpl: Failing this attempt. Failing
> the application.
> 16/03/10 18:33:10 INFO SparkClientImpl:          ApplicationMaster host:
> N/A
> 16/03/10 18:33:10 INFO SparkClientImpl:          ApplicationMaster RPC
> port: -1
> 16/03/10 18:33:10 INFO SparkClientImpl:          queue: default
> 16/03/10 18:33:10 INFO SparkClientImpl:          start time: 1457180833494
> 16/03/10 18:33:10 INFO SparkClientImpl:          final status: FAILED
> 16/03/10 18:33:10 INFO SparkClientImpl:          tracking URL:
> http://storm0:8088/cluster/app/application_1457002628102_0043
> 16/03/10 18:33:10 INFO SparkClientImpl:          user: stana
> 16/03/10 18:33:10 INFO SparkClientImpl: Exception in thread "main"
> org.apache.spark.SparkException: Application
> application_1457002628102_0043 finished with failed status
> 16/03/10 18:33:10 INFO SparkClientImpl:         at
> org.apache.spark.deploy.yarn.Client.run(Client.scala:920)
> 16/03/10 18:33:10 INFO SparkClientImpl:         at
> org.apache.spark.deploy.yarn.Client$.main(Client.scala:966)
> 16/03/10 18:33:10 INFO SparkClientImpl:         at
> org.apache.spark.deploy.yarn.Client.main(Client.scala)
> 16/03/10 18:33:10 INFO SparkClientImpl:         at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 16/03/10 18:33:10 INFO SparkClientImpl:         at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 16/03/10 18:33:10 INFO SparkClientImpl:         at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 16/03/10 18:33:10 INFO SparkClientImpl:         at
> java.lang.reflect.Method.invoke(Method.java:606)
> 16/03/10 18:33:10 INFO SparkClientImpl:         at
>
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
> 16/03/10 18:33:10 INFO SparkClientImpl:         at
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
> 16/03/10 18:33:10 INFO SparkClientImpl:         at
> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
> 16/03/10 18:33:10 INFO SparkClientImpl:         at
> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
> 16/03/10 18:33:10 INFO SparkClientImpl:         at
> org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO
> ShutdownHookManager: Shutdown hook called
> 16/03/10 18:33:10 INFO SparkClientImpl: 16/03/10 18:33:10 INFO
> ShutdownHookManager: Deleting directory
>
> /private/var/folders/vt/cjcdhms903x7brn1kbh558s40000gn/T/spark-5b92ce20-b6f8-4832-8b15-5e98bd0e0705
> 16/03/10 18:33:10 WARN SparkClientImpl: Error while waiting for client
> to connect.
> java.util.concurrent.ExecutionException: java.lang.RuntimeException:
> Cancel client '5bda93c0-865b-48a8-b368-c2fcc30e81e8'. Error: Child
> process exited before connecting back
>         at
> io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
> ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
>         at
> org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:101)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:98)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:94)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:63)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:55)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:114)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:131)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:117)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runJoinOptimizations(SparkCompiler.java:181)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:119)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10195)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:229)
> [hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:239)
> [hive-exec-2.0.0.jar:2.0.0]
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:479)
> [hive-exec-2.0.0.jar:?]
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:319)
> [hive-exec-2.0.0.jar:?]
>         at
> org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1255)
> [hive-exec-2.0.0.jar:?]
>         at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1301)
> [hive-exec-2.0.0.jar:?]
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1184)
> [hive-exec-2.0.0.jar:?]
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1172)
> [hive-exec-2.0.0.jar:?]
>         at
> org.apache.hadoop.hive.ql.TestHiveDriver.main(TestHiveDriver.java:41)
> [test-classes/:?]
> Caused by: java.lang.RuntimeException: Cancel client
> '5bda93c0-865b-48a8-b368-c2fcc30e81e8'. Error: Child process exited
> before connecting back
>         at
> org.apache.hive.spark.client.rpc.RpcServer.cancelClient(RpcServer.java:179)
> ~[hive-exec-2.0.0.jar:2.0.0]
>         at
> org.apache.hive.spark.client.SparkClientImpl$3.run(SparkClientImpl.java:450)
> ~[hive-exec-2.0.0.jar:2.0.0]
>         at java.lang.Thread.run(Thread.java:745) ~[?:1.7.0_67]
> 16/03/10 18:33:10 WARN SparkClientImpl: Child process exited with code 1.
> FAILED: SemanticException Failed to get a spark session:
> org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create
> spark client.
> 16/03/10 18:33:10 ERROR Driver: FAILED: SemanticException Failed to
> get a spark session: org.apache.hadoop.hive.ql.metadata.HiveException:
> Failed to create spark client.
> org.apache.hadoop.hive.ql.parse.SemanticException: Failed to get a
> spark session: org.apache.hadoop.hive.ql.metadata.HiveException:
> Failed to create spark client.
>         at
> org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:121)
>         at
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>         at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>         at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>         at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
>         at
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
>         at
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runJoinOptimizations(SparkCompiler.java:181)
>         at
> org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:119)
>         at
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102)
>         at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10195)
>         at
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:229)
>         at
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:239)
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:479)
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:319)
>         at
> org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1255)
>         at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1301)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1184)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1172)
>         at
> org.apache.hadoop.hive.ql.TestHiveDriver.main(TestHiveDriver.java:41)
>