You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Sudha KS <Su...@fuzzylogix.com> on 2017/07/04 08:07:34 UTC

SparkSession via HS2 - is it supported?

This is the code:  created a java class by extending org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
,and creates a sparkSession as-
         SparkSession spark = SparkSession.builder().enableHiveSupport().master("yarn-client").appName("SampleSparkUDTF_yarnV1").getOrCreate();
,and tries to read a table in hive DB:
Dataset inputData = spark.read().table(tableName);
Long countRows =  inputData.count();


Environment:  HDP-2.5.3.0, spark 2.0.0


Steps:
1.      Copied this custom UDTF jar into hdfs & also into auxlib
2.      Copied /usr/hdp/<2.5.x>/spark2/jars/*.jar into /usr/hdp/<2.5.x>/hive/auxlib/
3.      Connecting to HS2 using beeline to run this Spark UDT:

beeline -u jdbc:hive2://localhost:10000 -d org.apache.hive.jdbc.HiveDriver



CREATE TABLE TestTable (i int);

INSERT INTO TestTable VALUES (1);



0: jdbc:hive2://localhost:10000/> CREATE FUNCTION SparkUDT AS 'SparkHiveUDTF' using jar 'hdfs:///tmp/sparkHiveGenericUDTF-1.0.jar' ;

INFO  : converting to local hdfs:///tmp/sparkHiveGenericUDTF-1.0.jar

INFO  : Added [/tmp/69366d0d-6777-4860-82c0-c61482ccce87_resources/sparkHiveGenericUDTF-1.0.jar] to class path

INFO  : Added resources: [hdfs:///tmp/sparkHiveGenericUDTF-1.0.jar]

No rows affected (0.125 seconds)





0: jdbc:hive2://localhost:10000/> SELECT SparkUDT('tbl','TestTable');



failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null)

        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)

        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)

        at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)

        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)

        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)

        at java.security.AccessController.doPrivileged(Native Method)

        at javax.security.auth.Subject.doAs(Subject.java:422)

        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)

        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)

        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)

        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)

        at java.util.concurrent.FutureTask.run(FutureTask.java:266)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null)

        at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)

        at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)

        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325)

        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)

        ... 14 more

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable (null)

        at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:563)

        at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83)

        ... 17 more

Caused by: org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.

        at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85)

        at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)

        at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149)

        at org.apache.spark.SparkContext.<init>(SparkContext.scala:497)

        at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2275)

        at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:831)

        at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:823)

        at scala.Option.getOrElse(Option.scala:121)

        at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823)

        at SparkHiveUDTF.sparkJob(SparkHiveUDTF.java:97)

        at SparkHiveUDTF.process(SparkHiveUDTF.java:78)

        at org.apache.hadoop.hive.ql.exec.UDTFOperator.process(UDTFOperator.java:109)

        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)

        at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)

        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)

        at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:133)

        at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:170)

        at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:555)




Is it supported to launch sparkSession via HS2?