You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Sadhan Sood <sa...@gmail.com> on 2014/11/11 01:29:27 UTC

thrift jdbc server probably running queries as hive query

I was testing out the spark thrift jdbc server by running a simple query in
the beeline client. The spark itself is running on a yarn cluster.

However, when I run a query in beeline -> I see no running jobs in the
spark UI(completely empty) and the yarn UI seem to indicate that the
submitted query is being run as a map reduce job. This is probably also
being indicated from the spark logs but I am not completely sure:

2014-11-11 00:19:00,492 INFO  ql.Context
(Context.java:getMRScratchDir(267)) - New scratch dir is
hdfs://xxxxxxxx:9000/tmp/hive-ubuntu/hive_2014-11-11_00-19-00_367_3847629323646885865-1

2014-11-11 00:19:00,877 INFO  ql.Context
(Context.java:getMRScratchDir(267)) - New scratch dir is
hdfs://xxxxxxxx:9000/tmp/hive-ubuntu/hive_2014-11-11_00-19-00_367_3847629323646885865-2

2014-11-11 00:19:04,152 INFO  ql.Context
(Context.java:getMRScratchDir(267)) - New scratch dir is
hdfs://xxxxxxxx:9000/tmp/hive-ubuntu/hive_2014-11-11_00-19-00_367_3847629323646885865-2

2014-11-11 00:19:04,425 INFO  Configuration.deprecation
(Configuration.java:warnOnceIfDeprecated(1009)) - mapred.submit.replication
is deprecated. Instead, use mapreduce.client.submit.file.replication

2014-11-11 00:19:04,516 INFO  client.RMProxy
(RMProxy.java:createRMProxy(92)) - Connecting to ResourceManager
at xxxxxxxx:8032

2014-11-11 00:19:04,607 INFO  client.RMProxy
(RMProxy.java:createRMProxy(92)) - Connecting to ResourceManager
at xxxxxxxx:8032

2014-11-11 00:19:04,639 WARN  mapreduce.JobSubmitter
(JobSubmitter.java:copyAndConfigureFiles(150)) - Hadoop command-line option
parsing not performed. Implement the Tool interface and execute your
application with ToolRunner to remedy this

2014-11-11 00:00:08,806 INFO  input.FileInputFormat
(FileInputFormat.java:listStatus(287)) - Total input paths to process :
14912

2014-11-11 00:00:08,864 INFO  lzo.GPLNativeCodeLoader
(GPLNativeCodeLoader.java:<clinit>(34)) - Loaded native gpl library

2014-11-11 00:00:08,866 INFO  lzo.LzoCodec (LzoCodec.java:<clinit>(76)) -
Successfully loaded & initialized native-lzo library [hadoop-lzo rev
8e266e052e423af592871e2dfe09d54c03f6a0e8]

2014-11-11 00:00:09,873 INFO  input.CombineFileInputFormat
(CombineFileInputFormat.java:createSplits(413)) - DEBUG: Terminated node
allocation with : CompletedNodes: 1, size left: 194541317

2014-11-11 00:00:10,017 INFO  mapreduce.JobSubmitter
(JobSubmitter.java:submitJobInternal(396)) - number of splits:615

2014-11-11 00:00:10,095 INFO  mapreduce.JobSubmitter
(JobSubmitter.java:printTokens(479)) - Submitting tokens for job:
job_1414084656759_0115

2014-11-11 00:00:10,241 INFO  impl.YarnClientImpl
(YarnClientImpl.java:submitApplication(167)) - Submitted application
application_1414084656759_0115


It seems like the query is being run as a hive query instead of spark
query. The same query works fine when run from spark-sql cli.

Re: thrift jdbc server probably running queries as hive query

Posted by Cheng Lian <li...@gmail.com>.

Hey Sadhan,

Sorry for my previous abrupt reply. Submitting a MR job is definitely 
wrong here, I'm investigating. Would you mind to provide the 
Spark/Hive/Hadoop versions you are using? If you're using most recent 
master branch, a concrete commit sha1 would be very helpful.

Thanks!
Cheng


On 11/12/14 12:34 AM, Sadhan Sood wrote:
> Hi Cheng,
>
> I made sure the only hive server running on the machine is 
> hivethriftserver2.
>
> /usr/lib/jvm/default-java/bin/java -cp 
> /usr/lib/hadoop/lib/hadoop-lzo.jar::/mnt/sadhan/spark-3/sbin/../conf:/mnt/sadhan/spark-3/spark-assembly-1.2.0-SNAPSHOT-hadoop2.3.0-cdh5.0.2.jar:/etc/hadoop/conf 
> -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit --class 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --master yarn 
> --jars reporting.jar spark-internal
>
> The query I am running is a simple count(*): "select count(*) from Xyz 
> where date_prefix=20141031" and pretty sure it's submitting a map 
> reduce job based on the spark logs:
>
> TakesRest=false
>
> Total jobs = 1
>
> Launching Job 1 out of 1
>
> Number of reduce tasks determined at compile time: 1
>
> In order to change the average load for a reducer (in bytes):
>
>   set hive.exec.reducers.bytes.per.reducer=<number>
>
> In order to limit the maximum number of reducers:
>
>   set hive.exec.reducers.max=<number>
>
> In order to set a constant number of reducers:
>
>   set mapreduce.job.reduces=<number>
>
> 14/11/11 16:23:17 INFO ql.Context: New scratch dir is 
> hdfs://fdsfdsfsdfsdf:9000/tmp/hive-ubuntu/hive_2014-11-11_16-23-17_333_5669798325805509526-2
>
> Starting Job = job_1414084656759_0142, Tracking URL = 
> http://xxxxxxx:8100/proxy/application_1414084656759_0142/ 
> <http://t.signauxdix.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XYg2zGvG-W8rBGxP1p8d-TW64zBkx56dS1Dd58vwq02?t=http%3A%2F%2Fec2-54-83-34-89.compute-1.amazonaws.com%3A8100%2Fproxy%2Fapplication_1414084656759_0142%2F&si=6222577584832512&pi=626685a9-b628-43cc-91a1-93636171ce77>
>
> Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill 
> job_1414084656759_0142
>
>
> On Mon, Nov 10, 2014 at 9:59 PM, Cheng Lian <lian.cs.zju@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     Hey Sadhan,
>
>     I really don't think this is Spark log... Unlike Shark, Spark SQL
>     doesn't even provide a Hive mode to let you execute queries
>     against Hive. Would you please check whether there is an existing
>     HiveServer2 running there? Spark SQL HiveThriftServer2 is just a
>     Spark port of HiveServer2, and they share the same default
>     listening port. I guess the Thrift server didn't start
>     successfully because the HiveServer2 occupied the port, and your
>     Beeline session was probably linked against HiveServer2.
>
>     Cheng
>
>
>     On 11/11/14 8:29 AM, Sadhan Sood wrote:
>>     I was testing out the spark thrift jdbc server by running a
>>     simple query in the beeline client. The spark itself is running
>>     on a yarn cluster.
>>
>>     However, when I run a query in beeline -> I see no running jobs
>>     in the spark UI(completely empty) and the yarn UI seem to
>>     indicate that the submitted query is being run as a map reduce
>>     job. This is probably also being indicated from the spark logs
>>     but I am not completely sure:
>>
>>     2014-11-11 00:19:00,492 INFO  ql.Context
>>     (Context.java:getMRScratchDir(267)) - New scratch dir is
>>     hdfs://xxxxxxxx:9000/tmp/hive-ubuntu/hive_2014-11-11_00-19-00_367_3847629323646885865-1
>>
>>     2014-11-11 00:19:00,877 INFO  ql.Context
>>     (Context.java:getMRScratchDir(267)) - New scratch dir is
>>     hdfs://xxxxxxxx:9000/tmp/hive-ubuntu/hive_2014-11-11_00-19-00_367_3847629323646885865-2
>>
>>     2014-11-11 00:19:04,152 INFO  ql.Context
>>     (Context.java:getMRScratchDir(267)) - New scratch dir is
>>     hdfs://xxxxxxxx:9000/tmp/hive-ubuntu/hive_2014-11-11_00-19-00_367_3847629323646885865-2
>>
>>     2014-11-11 00:19:04,425 INFO Configuration.deprecation
>>     (Configuration.java:warnOnceIfDeprecated(1009)) -
>>     mapred.submit.replication is deprecated. Instead, use
>>     mapreduce.client.submit.file.replication
>>
>>     2014-11-11 00:19:04,516 INFO client.RMProxy
>>     (RMProxy.java:createRMProxy(92)) - Connecting to ResourceManager
>>     at xxxxxxxx:8032
>>
>>     2014-11-11 00:19:04,607 INFO client.RMProxy
>>     (RMProxy.java:createRMProxy(92)) - Connecting to ResourceManager
>>     at xxxxxxxx:8032
>>
>>     2014-11-11 00:19:04,639 WARN mapreduce.JobSubmitter
>>     (JobSubmitter.java:copyAndConfigureFiles(150)) - Hadoop
>>     command-line option parsing not performed. Implement the Tool
>>     interface and execute your application with ToolRunner to remedy this
>>
>>     2014-11-11 00:00:08,806 INFO  input.FileInputFormat
>>     (FileInputFormat.java:listStatus(287)) - Total input paths to
>>     process : 14912
>>
>>     2014-11-11 00:00:08,864 INFO  lzo.GPLNativeCodeLoader
>>     (GPLNativeCodeLoader.java:<clinit>(34)) - Loaded native gpl library
>>
>>     2014-11-11 00:00:08,866 INFO  lzo.LzoCodec
>>     (LzoCodec.java:<clinit>(76)) - Successfully loaded & initialized
>>     native-lzo library [hadoop-lzo rev
>>     8e266e052e423af592871e2dfe09d54c03f6a0e8]
>>
>>     2014-11-11 00:00:09,873 INFO  input.CombineFileInputFormat
>>     (CombineFileInputFormat.java:createSplits(413)) - DEBUG:
>>     Terminated node allocation with : CompletedNodes: 1, size left:
>>     194541317
>>
>>     2014-11-11 00:00:10,017 INFO  mapreduce.JobSubmitter
>>     (JobSubmitter.java:submitJobInternal(396)) - number of splits:615
>>
>>     2014-11-11 00:00:10,095 INFO  mapreduce.JobSubmitter
>>     (JobSubmitter.java:printTokens(479)) - Submitting tokens for job:
>>     job_1414084656759_0115
>>
>>     2014-11-11 00:00:10,241 INFO  impl.YarnClientImpl
>>     (YarnClientImpl.java:submitApplication(167)) - Submitted
>>     application application_1414084656759_0115
>>
>>
>>     It seems like the query is being run as a hive query instead of
>>     spark query. The same query works fine when run from spark-sql cli.
>>
>
>

Re: thrift jdbc server probably running queries as hive query

Posted by Cheng Lian <li...@gmail.com>.

Hey Sadhan,

Sorry for my previous abrupt reply. Submitting a MR job is definitely 
wrong here, I'm investigating. Would you mind to provide the 
Spark/Hive/Hadoop versions you are using? If you're using most recent 
master branch, a concrete commit sha1 would be very helpful.

Thanks!
Cheng


On 11/12/14 12:34 AM, Sadhan Sood wrote:
> Hi Cheng,
>
> I made sure the only hive server running on the machine is 
> hivethriftserver2.
>
> /usr/lib/jvm/default-java/bin/java -cp 
> /usr/lib/hadoop/lib/hadoop-lzo.jar::/mnt/sadhan/spark-3/sbin/../conf:/mnt/sadhan/spark-3/spark-assembly-1.2.0-SNAPSHOT-hadoop2.3.0-cdh5.0.2.jar:/etc/hadoop/conf 
> -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit --class 
> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --master yarn 
> --jars reporting.jar spark-internal
>
> The query I am running is a simple count(*): "select count(*) from Xyz 
> where date_prefix=20141031" and pretty sure it's submitting a map 
> reduce job based on the spark logs:
>
> TakesRest=false
>
> Total jobs = 1
>
> Launching Job 1 out of 1
>
> Number of reduce tasks determined at compile time: 1
>
> In order to change the average load for a reducer (in bytes):
>
>   set hive.exec.reducers.bytes.per.reducer=<number>
>
> In order to limit the maximum number of reducers:
>
>   set hive.exec.reducers.max=<number>
>
> In order to set a constant number of reducers:
>
>   set mapreduce.job.reduces=<number>
>
> 14/11/11 16:23:17 INFO ql.Context: New scratch dir is 
> hdfs://fdsfdsfsdfsdf:9000/tmp/hive-ubuntu/hive_2014-11-11_16-23-17_333_5669798325805509526-2
>
> Starting Job = job_1414084656759_0142, Tracking URL = 
> http://xxxxxxx:8100/proxy/application_1414084656759_0142/ 
> <http://t.signauxdix.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XYg2zGvG-W8rBGxP1p8d-TW64zBkx56dS1Dd58vwq02?t=http%3A%2F%2Fec2-54-83-34-89.compute-1.amazonaws.com%3A8100%2Fproxy%2Fapplication_1414084656759_0142%2F&si=6222577584832512&pi=626685a9-b628-43cc-91a1-93636171ce77>
>
> Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill 
> job_1414084656759_0142
>
>
> On Mon, Nov 10, 2014 at 9:59 PM, Cheng Lian <lian.cs.zju@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     Hey Sadhan,
>
>     I really don't think this is Spark log... Unlike Shark, Spark SQL
>     doesn't even provide a Hive mode to let you execute queries
>     against Hive. Would you please check whether there is an existing
>     HiveServer2 running there? Spark SQL HiveThriftServer2 is just a
>     Spark port of HiveServer2, and they share the same default
>     listening port. I guess the Thrift server didn't start
>     successfully because the HiveServer2 occupied the port, and your
>     Beeline session was probably linked against HiveServer2.
>
>     Cheng
>
>
>     On 11/11/14 8:29 AM, Sadhan Sood wrote:
>>     I was testing out the spark thrift jdbc server by running a
>>     simple query in the beeline client. The spark itself is running
>>     on a yarn cluster.
>>
>>     However, when I run a query in beeline -> I see no running jobs
>>     in the spark UI(completely empty) and the yarn UI seem to
>>     indicate that the submitted query is being run as a map reduce
>>     job. This is probably also being indicated from the spark logs
>>     but I am not completely sure:
>>
>>     2014-11-11 00:19:00,492 INFO  ql.Context
>>     (Context.java:getMRScratchDir(267)) - New scratch dir is
>>     hdfs://xxxxxxxx:9000/tmp/hive-ubuntu/hive_2014-11-11_00-19-00_367_3847629323646885865-1
>>
>>     2014-11-11 00:19:00,877 INFO  ql.Context
>>     (Context.java:getMRScratchDir(267)) - New scratch dir is
>>     hdfs://xxxxxxxx:9000/tmp/hive-ubuntu/hive_2014-11-11_00-19-00_367_3847629323646885865-2
>>
>>     2014-11-11 00:19:04,152 INFO  ql.Context
>>     (Context.java:getMRScratchDir(267)) - New scratch dir is
>>     hdfs://xxxxxxxx:9000/tmp/hive-ubuntu/hive_2014-11-11_00-19-00_367_3847629323646885865-2
>>
>>     2014-11-11 00:19:04,425 INFO Configuration.deprecation
>>     (Configuration.java:warnOnceIfDeprecated(1009)) -
>>     mapred.submit.replication is deprecated. Instead, use
>>     mapreduce.client.submit.file.replication
>>
>>     2014-11-11 00:19:04,516 INFO client.RMProxy
>>     (RMProxy.java:createRMProxy(92)) - Connecting to ResourceManager
>>     at xxxxxxxx:8032
>>
>>     2014-11-11 00:19:04,607 INFO client.RMProxy
>>     (RMProxy.java:createRMProxy(92)) - Connecting to ResourceManager
>>     at xxxxxxxx:8032
>>
>>     2014-11-11 00:19:04,639 WARN mapreduce.JobSubmitter
>>     (JobSubmitter.java:copyAndConfigureFiles(150)) - Hadoop
>>     command-line option parsing not performed. Implement the Tool
>>     interface and execute your application with ToolRunner to remedy this
>>
>>     2014-11-11 00:00:08,806 INFO  input.FileInputFormat
>>     (FileInputFormat.java:listStatus(287)) - Total input paths to
>>     process : 14912
>>
>>     2014-11-11 00:00:08,864 INFO  lzo.GPLNativeCodeLoader
>>     (GPLNativeCodeLoader.java:<clinit>(34)) - Loaded native gpl library
>>
>>     2014-11-11 00:00:08,866 INFO  lzo.LzoCodec
>>     (LzoCodec.java:<clinit>(76)) - Successfully loaded & initialized
>>     native-lzo library [hadoop-lzo rev
>>     8e266e052e423af592871e2dfe09d54c03f6a0e8]
>>
>>     2014-11-11 00:00:09,873 INFO  input.CombineFileInputFormat
>>     (CombineFileInputFormat.java:createSplits(413)) - DEBUG:
>>     Terminated node allocation with : CompletedNodes: 1, size left:
>>     194541317
>>
>>     2014-11-11 00:00:10,017 INFO  mapreduce.JobSubmitter
>>     (JobSubmitter.java:submitJobInternal(396)) - number of splits:615
>>
>>     2014-11-11 00:00:10,095 INFO  mapreduce.JobSubmitter
>>     (JobSubmitter.java:printTokens(479)) - Submitting tokens for job:
>>     job_1414084656759_0115
>>
>>     2014-11-11 00:00:10,241 INFO  impl.YarnClientImpl
>>     (YarnClientImpl.java:submitApplication(167)) - Submitted
>>     application application_1414084656759_0115
>>
>>
>>     It seems like the query is being run as a hive query instead of
>>     spark query. The same query works fine when run from spark-sql cli.
>>
>
>

Re: thrift jdbc server probably running queries as hive query

Posted by Sadhan Sood <sa...@gmail.com>.

Hi Cheng,

I made sure the only hive server running on the machine is
hivethriftserver2.

/usr/lib/jvm/default-java/bin/java -cp
/usr/lib/hadoop/lib/hadoop-lzo.jar::/mnt/sadhan/spark-3/sbin/../conf:/mnt/sadhan/spark-3/spark-assembly-1.2.0-SNAPSHOT-hadoop2.3.0-cdh5.0.2.jar:/etc/hadoop/conf
-Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit --class
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --master yarn
--jars reporting.jar spark-internal

The query I am running is a simple count(*): "select count(*) from Xyz
where date_prefix=20141031" and pretty sure it's submitting a map reduce
job based on the spark logs:

TakesRest=false

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks determined at compile time: 1

In order to change the average load for a reducer (in bytes):

  set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

  set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

  set mapreduce.job.reduces=<number>

14/11/11 16:23:17 INFO ql.Context: New scratch dir is
hdfs://fdsfdsfsdfsdf:9000/tmp/hive-ubuntu/hive_2014-11-11_16-23-17_333_5669798325805509526-2

Starting Job = job_1414084656759_0142, Tracking URL =
http://xxxxxxx:8100/proxy/application_1414084656759_0142/
<http://t.signauxdix.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XYg2zGvG-W8rBGxP1p8d-TW64zBkx56dS1Dd58vwq02?t=http%3A%2F%2Fec2-54-83-34-89.compute-1.amazonaws.com%3A8100%2Fproxy%2Fapplication_1414084656759_0142%2F&si=6222577584832512&pi=626685a9-b628-43cc-91a1-93636171ce77>

Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1414084656759_0142

On Mon, Nov 10, 2014 at 9:59 PM, Cheng Lian <li...@gmail.com> wrote:

>  Hey Sadhan,
>
> I really don't think this is Spark log... Unlike Shark, Spark SQL doesn't
> even provide a Hive mode to let you execute queries against Hive. Would you
> please check whether there is an existing HiveServer2 running there? Spark
> SQL HiveThriftServer2 is just a Spark port of HiveServer2, and they share
> the same default listening port. I guess the Thrift server didn't start
> successfully because the HiveServer2 occupied the port, and your Beeline
> session was probably linked against HiveServer2.
>
> Cheng
>
>
> On 11/11/14 8:29 AM, Sadhan Sood wrote:
>
> I was testing out the spark thrift jdbc server by running a simple query
> in the beeline client. The spark itself is running on a yarn cluster.
>
> However, when I run a query in beeline -> I see no running jobs in the
> spark UI(completely empty) and the yarn UI seem to indicate that the
> submitted query is being run as a map reduce job. This is probably also
> being indicated from the spark logs but I am not completely sure:
>
>  2014-11-11 00:19:00,492 INFO  ql.Context
> (Context.java:getMRScratchDir(267)) - New scratch dir is
> hdfs://xxxxxxxx:9000/tmp/hive-ubuntu/hive_2014-11-11_00-19-00_367_3847629323646885865-1
>
> 2014-11-11 00:19:00,877 INFO  ql.Context
> (Context.java:getMRScratchDir(267)) - New scratch dir is
> hdfs://xxxxxxxx:9000/tmp/hive-ubuntu/hive_2014-11-11_00-19-00_367_3847629323646885865-2
>
> 2014-11-11 00:19:04,152 INFO  ql.Context
> (Context.java:getMRScratchDir(267)) - New scratch dir is
> hdfs://xxxxxxxx:9000/tmp/hive-ubuntu/hive_2014-11-11_00-19-00_367_3847629323646885865-2
>
> 2014-11-11 00:19:04,425 INFO  Configuration.deprecation
> (Configuration.java:warnOnceIfDeprecated(1009)) - mapred.submit.replication
> is deprecated. Instead, use mapreduce.client.submit.file.replication
>
> 2014-11-11 00:19:04,516 INFO  client.RMProxy
> (RMProxy.java:createRMProxy(92)) - Connecting to ResourceManager
> at xxxxxxxx:8032
>
> 2014-11-11 00:19:04,607 INFO  client.RMProxy
> (RMProxy.java:createRMProxy(92)) - Connecting to ResourceManager
> at xxxxxxxx:8032
>
> 2014-11-11 00:19:04,639 WARN  mapreduce.JobSubmitter
> (JobSubmitter.java:copyAndConfigureFiles(150)) - Hadoop command-line option
> parsing not performed. Implement the Tool interface and execute your
> application with ToolRunner to remedy this
>
> 2014-11-11 00:00:08,806 INFO  input.FileInputFormat
> (FileInputFormat.java:listStatus(287)) - Total input paths to process :
> 14912
>
> 2014-11-11 00:00:08,864 INFO  lzo.GPLNativeCodeLoader
> (GPLNativeCodeLoader.java:<clinit>(34)) - Loaded native gpl library
>
> 2014-11-11 00:00:08,866 INFO  lzo.LzoCodec (LzoCodec.java:<clinit>(76)) -
> Successfully loaded & initialized native-lzo library [hadoop-lzo rev
> 8e266e052e423af592871e2dfe09d54c03f6a0e8]
>
> 2014-11-11 00:00:09,873 INFO  input.CombineFileInputFormat
> (CombineFileInputFormat.java:createSplits(413)) - DEBUG: Terminated node
> allocation with : CompletedNodes: 1, size left: 194541317
>
> 2014-11-11 00:00:10,017 INFO  mapreduce.JobSubmitter
> (JobSubmitter.java:submitJobInternal(396)) - number of splits:615
>
> 2014-11-11 00:00:10,095 INFO  mapreduce.JobSubmitter
> (JobSubmitter.java:printTokens(479)) - Submitting tokens for job:
> job_1414084656759_0115
>
> 2014-11-11 00:00:10,241 INFO  impl.YarnClientImpl
> (YarnClientImpl.java:submitApplication(167)) - Submitted application
> application_1414084656759_0115
>
>
>  It seems like the query is being run as a hive query instead of spark
> query. The same query works fine when run from spark-sql cli.
>
>
>

Re: thrift jdbc server probably running queries as hive query

Posted by Sadhan Sood <sa...@gmail.com>.

Hi Cheng,

I made sure the only hive server running on the machine is
hivethriftserver2.

/usr/lib/jvm/default-java/bin/java -cp
/usr/lib/hadoop/lib/hadoop-lzo.jar::/mnt/sadhan/spark-3/sbin/../conf:/mnt/sadhan/spark-3/spark-assembly-1.2.0-SNAPSHOT-hadoop2.3.0-cdh5.0.2.jar:/etc/hadoop/conf
-Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit --class
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --master yarn
--jars reporting.jar spark-internal

The query I am running is a simple count(*): "select count(*) from Xyz
where date_prefix=20141031" and pretty sure it's submitting a map reduce
job based on the spark logs:

TakesRest=false

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks determined at compile time: 1

In order to change the average load for a reducer (in bytes):

  set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

  set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

  set mapreduce.job.reduces=<number>

14/11/11 16:23:17 INFO ql.Context: New scratch dir is
hdfs://fdsfdsfsdfsdf:9000/tmp/hive-ubuntu/hive_2014-11-11_16-23-17_333_5669798325805509526-2

Starting Job = job_1414084656759_0142, Tracking URL =
http://xxxxxxx:8100/proxy/application_1414084656759_0142/
<http://t.signauxdix.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XYg2zGvG-W8rBGxP1p8d-TW64zBkx56dS1Dd58vwq02?t=http%3A%2F%2Fec2-54-83-34-89.compute-1.amazonaws.com%3A8100%2Fproxy%2Fapplication_1414084656759_0142%2F&si=6222577584832512&pi=626685a9-b628-43cc-91a1-93636171ce77>

Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1414084656759_0142

On Mon, Nov 10, 2014 at 9:59 PM, Cheng Lian <li...@gmail.com> wrote:

>  Hey Sadhan,
>
> I really don't think this is Spark log... Unlike Shark, Spark SQL doesn't
> even provide a Hive mode to let you execute queries against Hive. Would you
> please check whether there is an existing HiveServer2 running there? Spark
> SQL HiveThriftServer2 is just a Spark port of HiveServer2, and they share
> the same default listening port. I guess the Thrift server didn't start
> successfully because the HiveServer2 occupied the port, and your Beeline
> session was probably linked against HiveServer2.
>
> Cheng
>
>
> On 11/11/14 8:29 AM, Sadhan Sood wrote:
>
> I was testing out the spark thrift jdbc server by running a simple query
> in the beeline client. The spark itself is running on a yarn cluster.
>
> However, when I run a query in beeline -> I see no running jobs in the
> spark UI(completely empty) and the yarn UI seem to indicate that the
> submitted query is being run as a map reduce job. This is probably also
> being indicated from the spark logs but I am not completely sure:
>
>  2014-11-11 00:19:00,492 INFO  ql.Context
> (Context.java:getMRScratchDir(267)) - New scratch dir is
> hdfs://xxxxxxxx:9000/tmp/hive-ubuntu/hive_2014-11-11_00-19-00_367_3847629323646885865-1
>
> 2014-11-11 00:19:00,877 INFO  ql.Context
> (Context.java:getMRScratchDir(267)) - New scratch dir is
> hdfs://xxxxxxxx:9000/tmp/hive-ubuntu/hive_2014-11-11_00-19-00_367_3847629323646885865-2
>
> 2014-11-11 00:19:04,152 INFO  ql.Context
> (Context.java:getMRScratchDir(267)) - New scratch dir is
> hdfs://xxxxxxxx:9000/tmp/hive-ubuntu/hive_2014-11-11_00-19-00_367_3847629323646885865-2
>
> 2014-11-11 00:19:04,425 INFO  Configuration.deprecation
> (Configuration.java:warnOnceIfDeprecated(1009)) - mapred.submit.replication
> is deprecated. Instead, use mapreduce.client.submit.file.replication
>
> 2014-11-11 00:19:04,516 INFO  client.RMProxy
> (RMProxy.java:createRMProxy(92)) - Connecting to ResourceManager
> at xxxxxxxx:8032
>
> 2014-11-11 00:19:04,607 INFO  client.RMProxy
> (RMProxy.java:createRMProxy(92)) - Connecting to ResourceManager
> at xxxxxxxx:8032
>
> 2014-11-11 00:19:04,639 WARN  mapreduce.JobSubmitter
> (JobSubmitter.java:copyAndConfigureFiles(150)) - Hadoop command-line option
> parsing not performed. Implement the Tool interface and execute your
> application with ToolRunner to remedy this
>
> 2014-11-11 00:00:08,806 INFO  input.FileInputFormat
> (FileInputFormat.java:listStatus(287)) - Total input paths to process :
> 14912
>
> 2014-11-11 00:00:08,864 INFO  lzo.GPLNativeCodeLoader
> (GPLNativeCodeLoader.java:<clinit>(34)) - Loaded native gpl library
>
> 2014-11-11 00:00:08,866 INFO  lzo.LzoCodec (LzoCodec.java:<clinit>(76)) -
> Successfully loaded & initialized native-lzo library [hadoop-lzo rev
> 8e266e052e423af592871e2dfe09d54c03f6a0e8]
>
> 2014-11-11 00:00:09,873 INFO  input.CombineFileInputFormat
> (CombineFileInputFormat.java:createSplits(413)) - DEBUG: Terminated node
> allocation with : CompletedNodes: 1, size left: 194541317
>
> 2014-11-11 00:00:10,017 INFO  mapreduce.JobSubmitter
> (JobSubmitter.java:submitJobInternal(396)) - number of splits:615
>
> 2014-11-11 00:00:10,095 INFO  mapreduce.JobSubmitter
> (JobSubmitter.java:printTokens(479)) - Submitting tokens for job:
> job_1414084656759_0115
>
> 2014-11-11 00:00:10,241 INFO  impl.YarnClientImpl
> (YarnClientImpl.java:submitApplication(167)) - Submitted application
> application_1414084656759_0115
>
>
>  It seems like the query is being run as a hive query instead of spark
> query. The same query works fine when run from spark-sql cli.
>
>
>

Re: thrift jdbc server probably running queries as hive query

Posted by Cheng Lian <li...@gmail.com>.

Hey Sadhan,

I really don't think this is Spark log... Unlike Shark, Spark SQL 
doesn't even provide a Hive mode to let you execute queries against 
Hive. Would you please check whether there is an existing HiveServer2 
running there? Spark SQL HiveThriftServer2 is just a Spark port of 
HiveServer2, and they share the same default listening port. I guess the 
Thrift server didn't start successfully because the HiveServer2 occupied 
the port, and your Beeline session was probably linked against HiveServer2.

Cheng

On 11/11/14 8:29 AM, Sadhan Sood wrote:
> I was testing out the spark thrift jdbc server by running a simple 
> query in the beeline client. The spark itself is running on a yarn 
> cluster.
>
> However, when I run a query in beeline -> I see no running jobs in the 
> spark UI(completely empty) and the yarn UI seem to indicate that the 
> submitted query is being run as a map reduce job. This is probably 
> also being indicated from the spark logs but I am not completely sure:
>
> 2014-11-11 00:19:00,492 INFO  ql.Context 
> (Context.java:getMRScratchDir(267)) - New scratch dir is 
> hdfs://xxxxxxxx:9000/tmp/hive-ubuntu/hive_2014-11-11_00-19-00_367_3847629323646885865-1
>
> 2014-11-11 00:19:00,877 INFO  ql.Context 
> (Context.java:getMRScratchDir(267)) - New scratch dir is 
> hdfs://xxxxxxxx:9000/tmp/hive-ubuntu/hive_2014-11-11_00-19-00_367_3847629323646885865-2
>
> 2014-11-11 00:19:04,152 INFO  ql.Context 
> (Context.java:getMRScratchDir(267)) - New scratch dir is 
> hdfs://xxxxxxxx:9000/tmp/hive-ubuntu/hive_2014-11-11_00-19-00_367_3847629323646885865-2
>
> 2014-11-11 00:19:04,425 INFO Configuration.deprecation 
> (Configuration.java:warnOnceIfDeprecated(1009)) - 
> mapred.submit.replication is deprecated. Instead, use 
> mapreduce.client.submit.file.replication
>
> 2014-11-11 00:19:04,516 INFO  client.RMProxy 
> (RMProxy.java:createRMProxy(92)) - Connecting to ResourceManager 
> at xxxxxxxx:8032
>
> 2014-11-11 00:19:04,607 INFO  client.RMProxy 
> (RMProxy.java:createRMProxy(92)) - Connecting to ResourceManager 
> at xxxxxxxx:8032
>
> 2014-11-11 00:19:04,639 WARN mapreduce.JobSubmitter 
> (JobSubmitter.java:copyAndConfigureFiles(150)) - Hadoop command-line 
> option parsing not performed. Implement the Tool interface and execute 
> your application with ToolRunner to remedy this
>
> 2014-11-11 00:00:08,806 INFO  input.FileInputFormat 
> (FileInputFormat.java:listStatus(287)) - Total input paths to process 
> : 14912
>
> 2014-11-11 00:00:08,864 INFO  lzo.GPLNativeCodeLoader 
> (GPLNativeCodeLoader.java:<clinit>(34)) - Loaded native gpl library
>
> 2014-11-11 00:00:08,866 INFO  lzo.LzoCodec 
> (LzoCodec.java:<clinit>(76)) - Successfully loaded & initialized 
> native-lzo library [hadoop-lzo rev 
> 8e266e052e423af592871e2dfe09d54c03f6a0e8]
>
> 2014-11-11 00:00:09,873 INFO  input.CombineFileInputFormat 
> (CombineFileInputFormat.java:createSplits(413)) - DEBUG: Terminated 
> node allocation with : CompletedNodes: 1, size left: 194541317
>
> 2014-11-11 00:00:10,017 INFO  mapreduce.JobSubmitter 
> (JobSubmitter.java:submitJobInternal(396)) - number of splits:615
>
> 2014-11-11 00:00:10,095 INFO  mapreduce.JobSubmitter 
> (JobSubmitter.java:printTokens(479)) - Submitting tokens for job: 
> job_1414084656759_0115
>
> 2014-11-11 00:00:10,241 INFO  impl.YarnClientImpl 
> (YarnClientImpl.java:submitApplication(167)) - Submitted application 
> application_1414084656759_0115
>
>
> It seems like the query is being run as a hive query instead of spark 
> query. The same query works fine when run from spark-sql cli.
>

Re: thrift jdbc server probably running queries as hive query

Posted by Cheng Lian <li...@gmail.com>.

Hey Sadhan,

I really don't think this is Spark log... Unlike Shark, Spark SQL 
doesn't even provide a Hive mode to let you execute queries against 
Hive. Would you please check whether there is an existing HiveServer2 
running there? Spark SQL HiveThriftServer2 is just a Spark port of 
HiveServer2, and they share the same default listening port. I guess the 
Thrift server didn't start successfully because the HiveServer2 occupied 
the port, and your Beeline session was probably linked against HiveServer2.

Cheng

On 11/11/14 8:29 AM, Sadhan Sood wrote:
> I was testing out the spark thrift jdbc server by running a simple 
> query in the beeline client. The spark itself is running on a yarn 
> cluster.
>
> However, when I run a query in beeline -> I see no running jobs in the 
> spark UI(completely empty) and the yarn UI seem to indicate that the 
> submitted query is being run as a map reduce job. This is probably 
> also being indicated from the spark logs but I am not completely sure:
>
> 2014-11-11 00:19:00,492 INFO  ql.Context 
> (Context.java:getMRScratchDir(267)) - New scratch dir is 
> hdfs://xxxxxxxx:9000/tmp/hive-ubuntu/hive_2014-11-11_00-19-00_367_3847629323646885865-1
>
> 2014-11-11 00:19:00,877 INFO  ql.Context 
> (Context.java:getMRScratchDir(267)) - New scratch dir is 
> hdfs://xxxxxxxx:9000/tmp/hive-ubuntu/hive_2014-11-11_00-19-00_367_3847629323646885865-2
>
> 2014-11-11 00:19:04,152 INFO  ql.Context 
> (Context.java:getMRScratchDir(267)) - New scratch dir is 
> hdfs://xxxxxxxx:9000/tmp/hive-ubuntu/hive_2014-11-11_00-19-00_367_3847629323646885865-2
>
> 2014-11-11 00:19:04,425 INFO Configuration.deprecation 
> (Configuration.java:warnOnceIfDeprecated(1009)) - 
> mapred.submit.replication is deprecated. Instead, use 
> mapreduce.client.submit.file.replication
>
> 2014-11-11 00:19:04,516 INFO  client.RMProxy 
> (RMProxy.java:createRMProxy(92)) - Connecting to ResourceManager 
> at xxxxxxxx:8032
>
> 2014-11-11 00:19:04,607 INFO  client.RMProxy 
> (RMProxy.java:createRMProxy(92)) - Connecting to ResourceManager 
> at xxxxxxxx:8032
>
> 2014-11-11 00:19:04,639 WARN mapreduce.JobSubmitter 
> (JobSubmitter.java:copyAndConfigureFiles(150)) - Hadoop command-line 
> option parsing not performed. Implement the Tool interface and execute 
> your application with ToolRunner to remedy this
>
> 2014-11-11 00:00:08,806 INFO  input.FileInputFormat 
> (FileInputFormat.java:listStatus(287)) - Total input paths to process 
> : 14912
>
> 2014-11-11 00:00:08,864 INFO  lzo.GPLNativeCodeLoader 
> (GPLNativeCodeLoader.java:<clinit>(34)) - Loaded native gpl library
>
> 2014-11-11 00:00:08,866 INFO  lzo.LzoCodec 
> (LzoCodec.java:<clinit>(76)) - Successfully loaded & initialized 
> native-lzo library [hadoop-lzo rev 
> 8e266e052e423af592871e2dfe09d54c03f6a0e8]
>
> 2014-11-11 00:00:09,873 INFO  input.CombineFileInputFormat 
> (CombineFileInputFormat.java:createSplits(413)) - DEBUG: Terminated 
> node allocation with : CompletedNodes: 1, size left: 194541317
>
> 2014-11-11 00:00:10,017 INFO  mapreduce.JobSubmitter 
> (JobSubmitter.java:submitJobInternal(396)) - number of splits:615
>
> 2014-11-11 00:00:10,095 INFO  mapreduce.JobSubmitter 
> (JobSubmitter.java:printTokens(479)) - Submitting tokens for job: 
> job_1414084656759_0115
>
> 2014-11-11 00:00:10,241 INFO  impl.YarnClientImpl 
> (YarnClientImpl.java:submitApplication(167)) - Submitted application 
> application_1414084656759_0115
>
>
> It seems like the query is being run as a hive query instead of spark 
> query. The same query works fine when run from spark-sql cli.
>

Re: thrift jdbc server probably running queries as hive query

Posted by scwf <wa...@huawei.com>.

The sql run successfully? and what sql you running?



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/thrift-jdbc-server-probably-running-queries-as-hive-query-tp9267p9268.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org