You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by "@Sanjiv Singh" <sa...@gmail.com> on 2016/01/27 16:07:23 UTC

Having issue with Spark SQL JDBC on hive table !!!

Hi All,

I have configured Spark to query on hive table.

Run the Thrift JDBC/ODBC server using below command :

*cd $SPARK_HOME*
*./sbin/start-thriftserver.sh --master spark://myhost:7077 --hiveconf
hive.server2.thrift.bind.host=myhost --hiveconf
hive.server2.thrift.port=9999*

and also able to connect through beeline

*beeline>* !connect jdbc:hive2://192.168.145.20:9999
Enter username for jdbc:hive2://192.168.145.20:9999: root
Enter password for jdbc:hive2://192.168.145.20:9999: impetus
*beeline > *

It is not giving query result on hive table through Spark JDBC, but it is
working with spark HiveSQLContext. See complete scenario explain below.

Help me understand the issue why Spark SQL JDBC is not giving result ?

Below are version details.

*Hive Version      : 1.2.1*
*Hadoop Version :  2.6.0*
*Spark version    :  1.3.1*

Let me know if need other details.


*Created Hive Table , insert some records and query it :*

*beeline> !connect jdbc:hive2://myhost:10000*
Enter username for jdbc:hive2://myhost:10000: root
Enter password for jdbc:hive2://myhost:10000: ******
*beeline> create table tampTable(id int ,name string ) clustered by (id)
into 2 buckets stored as orc TBLPROPERTIES('transactional'='true');*
*beeline> insert into table tampTable values
(1,'row1'),(2,'row2'),(3,'row3');*
*beeline> select name from tampTable;*
name
---------
row1
row3
row2

*Query through SparkSQL HiveSQLContext :*

    SparkConf sparkConf = new SparkConf().setAppName("JavaSparkSQL");
    SparkContext sc = new SparkContext(sparkConf);
    HiveContext hiveContext = new HiveContext(sc);
DataFrame teenagers = hiveContext.sql("*SELECT name FROM tampTable*");
List<String> teenagerNames = teenagers.toJavaRDD().map(new Function<Row,
String>() {
 @Override
 public String call(Row row) {
 return "Name: " + row.getString(0);
 }
}).collect();
for (String name: teenagerNames) {
 System.out.println(name);
}
teenagers2.toJavaRDD().saveAsTextFile("/tmp1");
sc.stop();

which is working perfectly and giving all names from table *tempTable*

*Query through Spark SQL JDBC :*

*beeline> !connect jdbc:hive2://myhost:9999*
Enter username for jdbc:hive2://myhost:9999: root
Enter password for jdbc:hive2://myhost:9999: ******
*beeline> show tables;*
*temptable*
*..other tables*
beeline> *SELECT name FROM tampTable;*

I can list the table through "show tables", but I run the query , it is
either hanged or returns nothing.



Regards
Sanjiv Singh
Mob :  +091 9990-447-339

Re: Having issue with Spark SQL JDBC on hive table !!!

Posted by "@Sanjiv Singh" <sa...@gmail.com>.

It working now ...

I checked at Spark worker UI , executor startup failing with below error ,
JVM initialization failing because of wrong -Xms :

Invalid initial heap size: -Xms0MError: Could not create the Java
Virtual Machine.Error: A fatal exception has occurred. Program will
exit.

Thrift server is not picking executor memory from *spark-env.sh* , then I
added in thrift server startup script explicitly.

*./sbin/start-thriftserver.sh*

exec "$FWDIR"/sbin/spark-daemon.sh spark-submit $CLASS 1
--executor-memory 512M "$@"

With this , Executor start getting valid memory and JDBC queries are
getting results.

*conf/spark-env.sh* (executor memory configurations not picked by
thrift-server)

export SPARK_JAVA_OPTS="-Dspark.executor.memory=512M"
export SPARK_EXECUTOR_MEMORY=512M


Regards
Sanjiv Singh
Mob :  +091 9990-447-339

On Thu, Jan 28, 2016 at 10:57 PM, @Sanjiv Singh <sa...@gmail.com>
wrote:

> Adding to it
>
> job status at UI :
>
> Stage IdDescriptionSubmittedDurationTasks: Succeeded/TotalInputOutputShuffle
> ReadShuffle Write
> 1 select ename from employeetest(kill
> <http://impetus-d951centos:4040/stages/stage/kill?id=1&terminate=true>)collect
> at SparkPlan.scala:84
> <http://impetus-d951centos:4040/stages/stage?id=1&attempt=0>+details
>
> 2016/01/29 04:20:06 3.0 min
> 0/2
>
> Getting below exception on Spark UI :
>
> org.apache.spark.rdd.RDD.collect(RDD.scala:813)
> org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:84)
> org.apache.spark.sql.DataFrame.collect(DataFrame.scala:887)
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim13.scala:178)
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:231)
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:218)
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:233)
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:344)
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
> org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55)
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:744)
>
>
> Regards
> Sanjiv Singh
> Mob :  +091 9990-447-339
>
> On Thu, Jan 28, 2016 at 9:57 PM, @Sanjiv Singh <sa...@gmail.com>
> wrote:
>
>> Any help on this.
>>
>> Regards
>> Sanjiv Singh
>> Mob :  +091 9990-447-339
>>
>> On Wed, Jan 27, 2016 at 10:25 PM, @Sanjiv Singh <sa...@gmail.com>
>> wrote:
>>
>>> Hi Ted ,
>>> Its typo.
>>>
>>>
>>> Regards
>>> Sanjiv Singh
>>> Mob :  +091 9990-447-339
>>>
>>> On Wed, Jan 27, 2016 at 9:13 PM, Ted Yu <yu...@gmail.com> wrote:
>>>
>>>> In the last snippet, temptable is shown by 'show tables' command.
>>>> Yet you queried tampTable.
>>>>
>>>> I believe this just was typo :-)
>>>>
>>>> On Wed, Jan 27, 2016 at 7:07 AM, @Sanjiv Singh <sa...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I have configured Spark to query on hive table.
>>>>>
>>>>> Run the Thrift JDBC/ODBC server using below command :
>>>>>
>>>>> *cd $SPARK_HOME*
>>>>> *./sbin/start-thriftserver.sh --master spark://myhost:7077 --hiveconf
>>>>> hive.server2.thrift.bind.host=myhost --hiveconf
>>>>> hive.server2.thrift.port=9999*
>>>>>
>>>>> and also able to connect through beeline
>>>>>
>>>>> *beeline>* !connect jdbc:hive2://192.168.145.20:9999
>>>>> Enter username for jdbc:hive2://192.168.145.20:9999: root
>>>>> Enter password for jdbc:hive2://192.168.145.20:9999: impetus
>>>>> *beeline > *
>>>>>
>>>>> It is not giving query result on hive table through Spark JDBC, but it
>>>>> is working with spark HiveSQLContext. See complete scenario explain below.
>>>>>
>>>>> Help me understand the issue why Spark SQL JDBC is not giving result ?
>>>>>
>>>>> Below are version details.
>>>>>
>>>>> *Hive Version      : 1.2.1*
>>>>> *Hadoop Version :  2.6.0*
>>>>> *Spark version    :  1.3.1*
>>>>>
>>>>> Let me know if need other details.
>>>>>
>>>>>
>>>>> *Created Hive Table , insert some records and query it :*
>>>>>
>>>>> *beeline> !connect jdbc:hive2://myhost:10000*
>>>>> Enter username for jdbc:hive2://myhost:10000: root
>>>>> Enter password for jdbc:hive2://myhost:10000: ******
>>>>> *beeline> create table tampTable(id int ,name string ) clustered by
>>>>> (id) into 2 buckets stored as orc TBLPROPERTIES('transactional'='true');*
>>>>> *beeline> insert into table tampTable values
>>>>> (1,'row1'),(2,'row2'),(3,'row3');*
>>>>> *beeline> select name from tampTable;*
>>>>> name
>>>>> ---------
>>>>> row1
>>>>> row3
>>>>> row2
>>>>>
>>>>> *Query through SparkSQL HiveSQLContext :*
>>>>>
>>>>>     SparkConf sparkConf = new SparkConf().setAppName("JavaSparkSQL");
>>>>>     SparkContext sc = new SparkContext(sparkConf);
>>>>>     HiveContext hiveContext = new HiveContext(sc);
>>>>> DataFrame teenagers = hiveContext.sql("*SELECT name FROM tampTable*");
>>>>> List<String> teenagerNames = teenagers.toJavaRDD().map(new
>>>>> Function<Row, String>() {
>>>>>  @Override
>>>>>  public String call(Row row) {
>>>>>  return "Name: " + row.getString(0);
>>>>>  }
>>>>> }).collect();
>>>>> for (String name: teenagerNames) {
>>>>>  System.out.println(name);
>>>>> }
>>>>> teenagers2.toJavaRDD().saveAsTextFile("/tmp1");
>>>>> sc.stop();
>>>>>
>>>>> which is working perfectly and giving all names from table *tempTable*
>>>>>
>>>>> *Query through Spark SQL JDBC :*
>>>>>
>>>>> *beeline> !connect jdbc:hive2://myhost:9999*
>>>>> Enter username for jdbc:hive2://myhost:9999: root
>>>>> Enter password for jdbc:hive2://myhost:9999: ******
>>>>> *beeline> show tables;*
>>>>> *temptable*
>>>>> *..other tables*
>>>>> beeline> *SELECT name FROM tampTable;*
>>>>>
>>>>> I can list the table through "show tables", but I run the query , it
>>>>> is either hanged or returns nothing.
>>>>>
>>>>>
>>>>>
>>>>> Regards
>>>>> Sanjiv Singh
>>>>> Mob :  +091 9990-447-339
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Having issue with Spark SQL JDBC on hive table !!!

Posted by "@Sanjiv Singh" <sa...@gmail.com>.

Adding to it

job status at UI :

Stage IdDescriptionSubmittedDurationTasks: Succeeded/TotalInputOutputShuffle
ReadShuffle Write
1 select ename from employeetest(kill
<http://impetus-d951centos:4040/stages/stage/kill?id=1&terminate=true>)collect
at SparkPlan.scala:84
<http://impetus-d951centos:4040/stages/stage?id=1&attempt=0>+details

2016/01/29 04:20:06 3.0 min
0/2

Getting below exception on Spark UI :

org.apache.spark.rdd.RDD.collect(RDD.scala:813)
org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:84)
org.apache.spark.sql.DataFrame.collect(DataFrame.scala:887)
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim13.scala:178)
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:231)
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:218)
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:233)
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:344)
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55)
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:744)


Regards
Sanjiv Singh
Mob :  +091 9990-447-339

On Thu, Jan 28, 2016 at 9:57 PM, @Sanjiv Singh <sa...@gmail.com>
wrote:

> Any help on this.
>
> Regards
> Sanjiv Singh
> Mob :  +091 9990-447-339
>
> On Wed, Jan 27, 2016 at 10:25 PM, @Sanjiv Singh <sa...@gmail.com>
> wrote:
>
>> Hi Ted ,
>> Its typo.
>>
>>
>> Regards
>> Sanjiv Singh
>> Mob :  +091 9990-447-339
>>
>> On Wed, Jan 27, 2016 at 9:13 PM, Ted Yu <yu...@gmail.com> wrote:
>>
>>> In the last snippet, temptable is shown by 'show tables' command.
>>> Yet you queried tampTable.
>>>
>>> I believe this just was typo :-)
>>>
>>> On Wed, Jan 27, 2016 at 7:07 AM, @Sanjiv Singh <sa...@gmail.com>
>>> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I have configured Spark to query on hive table.
>>>>
>>>> Run the Thrift JDBC/ODBC server using below command :
>>>>
>>>> *cd $SPARK_HOME*
>>>> *./sbin/start-thriftserver.sh --master spark://myhost:7077 --hiveconf
>>>> hive.server2.thrift.bind.host=myhost --hiveconf
>>>> hive.server2.thrift.port=9999*
>>>>
>>>> and also able to connect through beeline
>>>>
>>>> *beeline>* !connect jdbc:hive2://192.168.145.20:9999
>>>> Enter username for jdbc:hive2://192.168.145.20:9999: root
>>>> Enter password for jdbc:hive2://192.168.145.20:9999: impetus
>>>> *beeline > *
>>>>
>>>> It is not giving query result on hive table through Spark JDBC, but it
>>>> is working with spark HiveSQLContext. See complete scenario explain below.
>>>>
>>>> Help me understand the issue why Spark SQL JDBC is not giving result ?
>>>>
>>>> Below are version details.
>>>>
>>>> *Hive Version      : 1.2.1*
>>>> *Hadoop Version :  2.6.0*
>>>> *Spark version    :  1.3.1*
>>>>
>>>> Let me know if need other details.
>>>>
>>>>
>>>> *Created Hive Table , insert some records and query it :*
>>>>
>>>> *beeline> !connect jdbc:hive2://myhost:10000*
>>>> Enter username for jdbc:hive2://myhost:10000: root
>>>> Enter password for jdbc:hive2://myhost:10000: ******
>>>> *beeline> create table tampTable(id int ,name string ) clustered by
>>>> (id) into 2 buckets stored as orc TBLPROPERTIES('transactional'='true');*
>>>> *beeline> insert into table tampTable values
>>>> (1,'row1'),(2,'row2'),(3,'row3');*
>>>> *beeline> select name from tampTable;*
>>>> name
>>>> ---------
>>>> row1
>>>> row3
>>>> row2
>>>>
>>>> *Query through SparkSQL HiveSQLContext :*
>>>>
>>>>     SparkConf sparkConf = new SparkConf().setAppName("JavaSparkSQL");
>>>>     SparkContext sc = new SparkContext(sparkConf);
>>>>     HiveContext hiveContext = new HiveContext(sc);
>>>> DataFrame teenagers = hiveContext.sql("*SELECT name FROM tampTable*");
>>>> List<String> teenagerNames = teenagers.toJavaRDD().map(new
>>>> Function<Row, String>() {
>>>>  @Override
>>>>  public String call(Row row) {
>>>>  return "Name: " + row.getString(0);
>>>>  }
>>>> }).collect();
>>>> for (String name: teenagerNames) {
>>>>  System.out.println(name);
>>>> }
>>>> teenagers2.toJavaRDD().saveAsTextFile("/tmp1");
>>>> sc.stop();
>>>>
>>>> which is working perfectly and giving all names from table *tempTable*
>>>>
>>>> *Query through Spark SQL JDBC :*
>>>>
>>>> *beeline> !connect jdbc:hive2://myhost:9999*
>>>> Enter username for jdbc:hive2://myhost:9999: root
>>>> Enter password for jdbc:hive2://myhost:9999: ******
>>>> *beeline> show tables;*
>>>> *temptable*
>>>> *..other tables*
>>>> beeline> *SELECT name FROM tampTable;*
>>>>
>>>> I can list the table through "show tables", but I run the query , it is
>>>> either hanged or returns nothing.
>>>>
>>>>
>>>>
>>>> Regards
>>>> Sanjiv Singh
>>>> Mob :  +091 9990-447-339
>>>>
>>>
>>>
>>
>

Re: Having issue with Spark SQL JDBC on hive table !!!

Posted by "@Sanjiv Singh" <sa...@gmail.com>.

Any help on this.

Regards
Sanjiv Singh
Mob :  +091 9990-447-339

On Wed, Jan 27, 2016 at 10:25 PM, @Sanjiv Singh <sa...@gmail.com>
wrote:

> Hi Ted ,
> Its typo.
>
>
> Regards
> Sanjiv Singh
> Mob :  +091 9990-447-339
>
> On Wed, Jan 27, 2016 at 9:13 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> In the last snippet, temptable is shown by 'show tables' command.
>> Yet you queried tampTable.
>>
>> I believe this just was typo :-)
>>
>> On Wed, Jan 27, 2016 at 7:07 AM, @Sanjiv Singh <sa...@gmail.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> I have configured Spark to query on hive table.
>>>
>>> Run the Thrift JDBC/ODBC server using below command :
>>>
>>> *cd $SPARK_HOME*
>>> *./sbin/start-thriftserver.sh --master spark://myhost:7077 --hiveconf
>>> hive.server2.thrift.bind.host=myhost --hiveconf
>>> hive.server2.thrift.port=9999*
>>>
>>> and also able to connect through beeline
>>>
>>> *beeline>* !connect jdbc:hive2://192.168.145.20:9999
>>> Enter username for jdbc:hive2://192.168.145.20:9999: root
>>> Enter password for jdbc:hive2://192.168.145.20:9999: impetus
>>> *beeline > *
>>>
>>> It is not giving query result on hive table through Spark JDBC, but it
>>> is working with spark HiveSQLContext. See complete scenario explain below.
>>>
>>> Help me understand the issue why Spark SQL JDBC is not giving result ?
>>>
>>> Below are version details.
>>>
>>> *Hive Version      : 1.2.1*
>>> *Hadoop Version :  2.6.0*
>>> *Spark version    :  1.3.1*
>>>
>>> Let me know if need other details.
>>>
>>>
>>> *Created Hive Table , insert some records and query it :*
>>>
>>> *beeline> !connect jdbc:hive2://myhost:10000*
>>> Enter username for jdbc:hive2://myhost:10000: root
>>> Enter password for jdbc:hive2://myhost:10000: ******
>>> *beeline> create table tampTable(id int ,name string ) clustered by (id)
>>> into 2 buckets stored as orc TBLPROPERTIES('transactional'='true');*
>>> *beeline> insert into table tampTable values
>>> (1,'row1'),(2,'row2'),(3,'row3');*
>>> *beeline> select name from tampTable;*
>>> name
>>> ---------
>>> row1
>>> row3
>>> row2
>>>
>>> *Query through SparkSQL HiveSQLContext :*
>>>
>>>     SparkConf sparkConf = new SparkConf().setAppName("JavaSparkSQL");
>>>     SparkContext sc = new SparkContext(sparkConf);
>>>     HiveContext hiveContext = new HiveContext(sc);
>>> DataFrame teenagers = hiveContext.sql("*SELECT name FROM tampTable*");
>>> List<String> teenagerNames = teenagers.toJavaRDD().map(new Function<Row,
>>> String>() {
>>>  @Override
>>>  public String call(Row row) {
>>>  return "Name: " + row.getString(0);
>>>  }
>>> }).collect();
>>> for (String name: teenagerNames) {
>>>  System.out.println(name);
>>> }
>>> teenagers2.toJavaRDD().saveAsTextFile("/tmp1");
>>> sc.stop();
>>>
>>> which is working perfectly and giving all names from table *tempTable*
>>>
>>> *Query through Spark SQL JDBC :*
>>>
>>> *beeline> !connect jdbc:hive2://myhost:9999*
>>> Enter username for jdbc:hive2://myhost:9999: root
>>> Enter password for jdbc:hive2://myhost:9999: ******
>>> *beeline> show tables;*
>>> *temptable*
>>> *..other tables*
>>> beeline> *SELECT name FROM tampTable;*
>>>
>>> I can list the table through "show tables", but I run the query , it is
>>> either hanged or returns nothing.
>>>
>>>
>>>
>>> Regards
>>> Sanjiv Singh
>>> Mob :  +091 9990-447-339
>>>
>>
>>
>

Re: Having issue with Spark SQL JDBC on hive table !!!

Posted by "@Sanjiv Singh" <sa...@gmail.com>.

Hi Ted ,
Its typo.


Regards
Sanjiv Singh
Mob :  +091 9990-447-339

On Wed, Jan 27, 2016 at 9:13 PM, Ted Yu <yu...@gmail.com> wrote:

> In the last snippet, temptable is shown by 'show tables' command.
> Yet you queried tampTable.
>
> I believe this just was typo :-)
>
> On Wed, Jan 27, 2016 at 7:07 AM, @Sanjiv Singh <sa...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> I have configured Spark to query on hive table.
>>
>> Run the Thrift JDBC/ODBC server using below command :
>>
>> *cd $SPARK_HOME*
>> *./sbin/start-thriftserver.sh --master spark://myhost:7077 --hiveconf
>> hive.server2.thrift.bind.host=myhost --hiveconf
>> hive.server2.thrift.port=9999*
>>
>> and also able to connect through beeline
>>
>> *beeline>* !connect jdbc:hive2://192.168.145.20:9999
>> Enter username for jdbc:hive2://192.168.145.20:9999: root
>> Enter password for jdbc:hive2://192.168.145.20:9999: impetus
>> *beeline > *
>>
>> It is not giving query result on hive table through Spark JDBC, but it is
>> working with spark HiveSQLContext. See complete scenario explain below.
>>
>> Help me understand the issue why Spark SQL JDBC is not giving result ?
>>
>> Below are version details.
>>
>> *Hive Version      : 1.2.1*
>> *Hadoop Version :  2.6.0*
>> *Spark version    :  1.3.1*
>>
>> Let me know if need other details.
>>
>>
>> *Created Hive Table , insert some records and query it :*
>>
>> *beeline> !connect jdbc:hive2://myhost:10000*
>> Enter username for jdbc:hive2://myhost:10000: root
>> Enter password for jdbc:hive2://myhost:10000: ******
>> *beeline> create table tampTable(id int ,name string ) clustered by (id)
>> into 2 buckets stored as orc TBLPROPERTIES('transactional'='true');*
>> *beeline> insert into table tampTable values
>> (1,'row1'),(2,'row2'),(3,'row3');*
>> *beeline> select name from tampTable;*
>> name
>> ---------
>> row1
>> row3
>> row2
>>
>> *Query through SparkSQL HiveSQLContext :*
>>
>>     SparkConf sparkConf = new SparkConf().setAppName("JavaSparkSQL");
>>     SparkContext sc = new SparkContext(sparkConf);
>>     HiveContext hiveContext = new HiveContext(sc);
>> DataFrame teenagers = hiveContext.sql("*SELECT name FROM tampTable*");
>> List<String> teenagerNames = teenagers.toJavaRDD().map(new Function<Row,
>> String>() {
>>  @Override
>>  public String call(Row row) {
>>  return "Name: " + row.getString(0);
>>  }
>> }).collect();
>> for (String name: teenagerNames) {
>>  System.out.println(name);
>> }
>> teenagers2.toJavaRDD().saveAsTextFile("/tmp1");
>> sc.stop();
>>
>> which is working perfectly and giving all names from table *tempTable*
>>
>> *Query through Spark SQL JDBC :*
>>
>> *beeline> !connect jdbc:hive2://myhost:9999*
>> Enter username for jdbc:hive2://myhost:9999: root
>> Enter password for jdbc:hive2://myhost:9999: ******
>> *beeline> show tables;*
>> *temptable*
>> *..other tables*
>> beeline> *SELECT name FROM tampTable;*
>>
>> I can list the table through "show tables", but I run the query , it is
>> either hanged or returns nothing.
>>
>>
>>
>> Regards
>> Sanjiv Singh
>> Mob :  +091 9990-447-339
>>
>
>

Re: Having issue with Spark SQL JDBC on hive table !!!

Posted by Ted Yu <yu...@gmail.com>.

In the last snippet, temptable is shown by 'show tables' command.
Yet you queried tampTable.

I believe this just was typo :-)

On Wed, Jan 27, 2016 at 7:07 AM, @Sanjiv Singh <sa...@gmail.com>
wrote:

> Hi All,
>
> I have configured Spark to query on hive table.
>
> Run the Thrift JDBC/ODBC server using below command :
>
> *cd $SPARK_HOME*
> *./sbin/start-thriftserver.sh --master spark://myhost:7077 --hiveconf
> hive.server2.thrift.bind.host=myhost --hiveconf
> hive.server2.thrift.port=9999*
>
> and also able to connect through beeline
>
> *beeline>* !connect jdbc:hive2://192.168.145.20:9999
> Enter username for jdbc:hive2://192.168.145.20:9999: root
> Enter password for jdbc:hive2://192.168.145.20:9999: impetus
> *beeline > *
>
> It is not giving query result on hive table through Spark JDBC, but it is
> working with spark HiveSQLContext. See complete scenario explain below.
>
> Help me understand the issue why Spark SQL JDBC is not giving result ?
>
> Below are version details.
>
> *Hive Version      : 1.2.1*
> *Hadoop Version :  2.6.0*
> *Spark version    :  1.3.1*
>
> Let me know if need other details.
>
>
> *Created Hive Table , insert some records and query it :*
>
> *beeline> !connect jdbc:hive2://myhost:10000*
> Enter username for jdbc:hive2://myhost:10000: root
> Enter password for jdbc:hive2://myhost:10000: ******
> *beeline> create table tampTable(id int ,name string ) clustered by (id)
> into 2 buckets stored as orc TBLPROPERTIES('transactional'='true');*
> *beeline> insert into table tampTable values
> (1,'row1'),(2,'row2'),(3,'row3');*
> *beeline> select name from tampTable;*
> name
> ---------
> row1
> row3
> row2
>
> *Query through SparkSQL HiveSQLContext :*
>
>     SparkConf sparkConf = new SparkConf().setAppName("JavaSparkSQL");
>     SparkContext sc = new SparkContext(sparkConf);
>     HiveContext hiveContext = new HiveContext(sc);
> DataFrame teenagers = hiveContext.sql("*SELECT name FROM tampTable*");
> List<String> teenagerNames = teenagers.toJavaRDD().map(new Function<Row,
> String>() {
>  @Override
>  public String call(Row row) {
>  return "Name: " + row.getString(0);
>  }
> }).collect();
> for (String name: teenagerNames) {
>  System.out.println(name);
> }
> teenagers2.toJavaRDD().saveAsTextFile("/tmp1");
> sc.stop();
>
> which is working perfectly and giving all names from table *tempTable*
>
> *Query through Spark SQL JDBC :*
>
> *beeline> !connect jdbc:hive2://myhost:9999*
> Enter username for jdbc:hive2://myhost:9999: root
> Enter password for jdbc:hive2://myhost:9999: ******
> *beeline> show tables;*
> *temptable*
> *..other tables*
> beeline> *SELECT name FROM tampTable;*
>
> I can list the table through "show tables", but I run the query , it is
> either hanged or returns nothing.
>
>
>
> Regards
> Sanjiv Singh
> Mob :  +091 9990-447-339
>