You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Benjamin Kim <bb...@gmail.com> on 2016/09/13 21:32:54 UTC

Spark SQL Thriftserver

Does anyone have any thoughts about using Spark SQL Thriftserver in Spark 1.6.2 instead of HiveServer2? We are considering abandoning HiveServer2 for it. Some advice and gotcha’s would be nice to know.

Thanks,
Ben
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Spark SQL Thriftserver

Posted by Takeshi Yamamuro <li...@gmail.com>.

Hi, all

Spark STS just uses HiveContext inside and does not use MR.
Anyway, Spark STS misses some HiveServer2 functionalities such as HA (See:
https://issues.apache.org/jira/browse/SPARK-11100) and has some known
issues there.
So, you'd better off checking all the jira issues related to STS for
considering the replacement.

// maropu

On Wed, Sep 14, 2016 at 8:55 AM, ayan guha <gu...@gmail.com> wrote:

> Hi
>
> AFAIK STS uses Spark SQL and not Map Reduce. Is that not correct?
>
> Best
> Ayan
>
> On Wed, Sep 14, 2016 at 8:51 AM, Mich Talebzadeh <
> mich.talebzadeh@gmail.com> wrote:
>
>> STS will rely on Hive execution engine. My Hive uses Spark execution
>> engine so STS will pass the SQL to Hive and let it do the work and return
>> the result set
>>
>>  which beeline
>> /usr/lib/spark-2.0.0-bin-hadoop2.6/bin/beeline
>> ${SPARK_HOME}/bin/beeline -u jdbc:hive2://rhes564:10055 -n hduser -p
>> xxxxxxxx
>> Connecting to jdbc:hive2://rhes564:10055
>> Connected to: Spark SQL (version 2.0.0)
>> Driver: Hive JDBC (version 1.2.1.spark2)
>> Transaction isolation: TRANSACTION_REPEATABLE_READ
>> Beeline version 1.2.1.spark2 by Apache Hive
>> 0: jdbc:hive2://rhes564:10055>
>>
>> jdbc:hive2://rhes564:10055> select count(1) from test.prices;
>> Ok I did a simple query in STS, You will this in hive.log
>>
>> 2016-09-13T23:44:50,996 INFO  [pool-4-thread-4]: metastore.HiveMetaStore
>> (HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217
>> get_database: test
>> 2016-09-13T23:44:50,996 INFO  [pool-4-thread-4]: HiveMetaStore.audit
>> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
>> ip=50.140.197.217       cmd=source:50.140.197.217 get_database: test
>> 2016-09-13T23:44:50,998 INFO  [pool-4-thread-4]: metastore.HiveMetaStore
>> (HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217 get_table :
>> db=test tbl=prices
>> 2016-09-13T23:44:50,998 INFO  [pool-4-thread-4]: HiveMetaStore.audit
>> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
>> ip=50.140.197.217       cmd=source:50.140.197.217 get_table : db=test
>> tbl=prices
>> 2016-09-13T23:44:51,007 INFO  [pool-4-thread-4]: metastore.HiveMetaStore
>> (HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217 get_table :
>> db=test tbl=prices
>> 2016-09-13T23:44:51,007 INFO  [pool-4-thread-4]: HiveMetaStore.audit
>> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
>> ip=50.140.197.217       cmd=source:50.140.197.217 get_table : db=test
>> tbl=prices
>> 2016-09-13T23:44:51,021 INFO  [pool-4-thread-4]: metastore.HiveMetaStore
>> (HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217
>> get_database: test
>> 2016-09-13T23:44:51,021 INFO  [pool-4-thread-4]: HiveMetaStore.audit
>> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
>> ip=50.140.197.217       cmd=source:50.140.197.217 get_database: test
>> 2016-09-13T23:44:51,023 INFO  [pool-4-thread-4]: metastore.HiveMetaStore
>> (HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217 get_table :
>> db=test tbl=prices
>> 2016-09-13T23:44:51,023 INFO  [pool-4-thread-4]: HiveMetaStore.audit
>> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
>> ip=50.140.197.217       cmd=source:50.140.197.217 get_table : db=test
>> tbl=prices
>> 2016-09-13T23:44:51,029 INFO  [pool-4-thread-4]: metastore.HiveMetaStore
>> (HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217 get_table :
>> db=test tbl=prices
>> 2016-09-13T23:44:51,029 INFO  [pool-4-thread-4]: HiveMetaStore.audit
>> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
>> ip=50.140.197.217       cmd=source:50.140.197.217 get_table : db=test
>> tbl=prices
>>
>> I think it is a good idea to switch to Spark engine (as opposed to MR).
>> My tests proved that Hive on Spark using DAG and in-memory offering runs at
>> least by order of magnitude faster compared to map-reduce.
>>
>> You can either connect to beeline from $HIVE_HOME/... or beeline from
>> $SPARK_HOME
>>
>> HTH
>>
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 13 September 2016 at 23:28, Benjamin Kim <bb...@gmail.com> wrote:
>>
>>> Mich,
>>>
>>> It sounds like that there would be no harm in changing then. Are you
>>> saying that using STS would still use MapReduce to run the SQL statements?
>>> What our users are doing in our CDH 5.7.2 installation is changing the
>>> execution engine to Spark when connected to HiveServer2 to get faster
>>> results. Would they still have to do this using STS? Lastly, we are seeing
>>> zombie YARN jobs left behind even after a user disconnects. Are you seeing
>>> this happen with STS? If not, then this would be even better.
>>>
>>> Thanks for your fast reply.
>>>
>>> Cheers,
>>> Ben
>>>
>>> On Sep 13, 2016, at 3:15 PM, Mich Talebzadeh <mi...@gmail.com>
>>> wrote:
>>>
>>> Hi,
>>>
>>> Spark Thrift server (STS) still uses hive thrift server. If you look at
>>> $SPARK_HOME/sbin/start-thriftserver.sh you will see (mine is Spark 2)
>>>
>>> function usage {
>>>   echo "Usage: ./sbin/start-thriftserver [options] [thrift server
>>> options]"
>>>   pattern="usage"
>>>   *pattern+="\|Spark assembly has been built with Hive"*
>>>   pattern+="\|NOTE: SPARK_PREPEND_CLASSES is set"
>>>   pattern+="\|Spark Command: "
>>>   pattern+="\|======="
>>>   pattern+="\|--help"
>>>
>>>
>>> Indeed when you start STS, you pass hiveconf parameter to it
>>>
>>> ${SPARK_HOME}/sbin/start-thriftserver.sh \
>>>                 --master  \
>>>                 --hiveconf hive.server2.thrift.port=10055 \
>>>
>>> and STS bypasses Spark optimiser and uses Hive optimizer and execution
>>> engine. You will see this in hive.log file
>>>
>>> So I don't think it is going to give you much difference. Unless they
>>> have recently changed the design of STS.
>>>
>>> HTH
>>>
>>>
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>> On 13 September 2016 at 22:32, Benjamin Kim <bb...@gmail.com> wrote:
>>>
>>>> Does anyone have any thoughts about using Spark SQL Thriftserver in
>>>> Spark 1.6.2 instead of HiveServer2? We are considering abandoning
>>>> HiveServer2 for it. Some advice and gotcha’s would be nice to know.
>>>>
>>>> Thanks,
>>>> Ben
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>>
>>>>
>>>
>>>
>>
>
>
> --
> Best Regards,
> Ayan Guha
>



-- 
---
Takeshi Yamamuro

Re: Spark SQL Thriftserver

Posted by Mich Talebzadeh <mi...@gmail.com>.

Actually this is what it says

Connecting to jdbc:hive2://rhes564:10055
Connected to: Spark SQL (version 2.0.0)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1.spark2 by Apache Hive

So it uses Spark SQL. However, they do not seem to have upgraded Beeline
version from 1.2.1

HTH

It is a useful tool with Zeppelin.

HTH

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 14 September 2016 at 00:55, ayan guha <gu...@gmail.com> wrote:

> Hi
>
> AFAIK STS uses Spark SQL and not Map Reduce. Is that not correct?
>
> Best
> Ayan
>
> On Wed, Sep 14, 2016 at 8:51 AM, Mich Talebzadeh <
> mich.talebzadeh@gmail.com> wrote:
>
>> STS will rely on Hive execution engine. My Hive uses Spark execution
>> engine so STS will pass the SQL to Hive and let it do the work and return
>> the result set
>>
>>  which beeline
>> /usr/lib/spark-2.0.0-bin-hadoop2.6/bin/beeline
>> ${SPARK_HOME}/bin/beeline -u jdbc:hive2://rhes564:10055 -n hduser -p
>> xxxxxxxx
>> Connecting to jdbc:hive2://rhes564:10055
>> Connected to: Spark SQL (version 2.0.0)
>> Driver: Hive JDBC (version 1.2.1.spark2)
>> Transaction isolation: TRANSACTION_REPEATABLE_READ
>> Beeline version 1.2.1.spark2 by Apache Hive
>> 0: jdbc:hive2://rhes564:10055>
>>
>> jdbc:hive2://rhes564:10055> select count(1) from test.prices;
>> Ok I did a simple query in STS, You will this in hive.log
>>
>> 2016-09-13T23:44:50,996 INFO  [pool-4-thread-4]: metastore.HiveMetaStore
>> (HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217
>> get_database: test
>> 2016-09-13T23:44:50,996 INFO  [pool-4-thread-4]: HiveMetaStore.audit
>> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
>> ip=50.140.197.217       cmd=source:50.140.197.217 get_database: test
>> 2016-09-13T23:44:50,998 INFO  [pool-4-thread-4]: metastore.HiveMetaStore
>> (HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217 get_table :
>> db=test tbl=prices
>> 2016-09-13T23:44:50,998 INFO  [pool-4-thread-4]: HiveMetaStore.audit
>> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
>> ip=50.140.197.217       cmd=source:50.140.197.217 get_table : db=test
>> tbl=prices
>> 2016-09-13T23:44:51,007 INFO  [pool-4-thread-4]: metastore.HiveMetaStore
>> (HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217 get_table :
>> db=test tbl=prices
>> 2016-09-13T23:44:51,007 INFO  [pool-4-thread-4]: HiveMetaStore.audit
>> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
>> ip=50.140.197.217       cmd=source:50.140.197.217 get_table : db=test
>> tbl=prices
>> 2016-09-13T23:44:51,021 INFO  [pool-4-thread-4]: metastore.HiveMetaStore
>> (HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217
>> get_database: test
>> 2016-09-13T23:44:51,021 INFO  [pool-4-thread-4]: HiveMetaStore.audit
>> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
>> ip=50.140.197.217       cmd=source:50.140.197.217 get_database: test
>> 2016-09-13T23:44:51,023 INFO  [pool-4-thread-4]: metastore.HiveMetaStore
>> (HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217 get_table :
>> db=test tbl=prices
>> 2016-09-13T23:44:51,023 INFO  [pool-4-thread-4]: HiveMetaStore.audit
>> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
>> ip=50.140.197.217       cmd=source:50.140.197.217 get_table : db=test
>> tbl=prices
>> 2016-09-13T23:44:51,029 INFO  [pool-4-thread-4]: metastore.HiveMetaStore
>> (HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217 get_table :
>> db=test tbl=prices
>> 2016-09-13T23:44:51,029 INFO  [pool-4-thread-4]: HiveMetaStore.audit
>> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
>> ip=50.140.197.217       cmd=source:50.140.197.217 get_table : db=test
>> tbl=prices
>>
>> I think it is a good idea to switch to Spark engine (as opposed to MR).
>> My tests proved that Hive on Spark using DAG and in-memory offering runs at
>> least by order of magnitude faster compared to map-reduce.
>>
>> You can either connect to beeline from $HIVE_HOME/... or beeline from
>> $SPARK_HOME
>>
>> HTH
>>
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 13 September 2016 at 23:28, Benjamin Kim <bb...@gmail.com> wrote:
>>
>>> Mich,
>>>
>>> It sounds like that there would be no harm in changing then. Are you
>>> saying that using STS would still use MapReduce to run the SQL statements?
>>> What our users are doing in our CDH 5.7.2 installation is changing the
>>> execution engine to Spark when connected to HiveServer2 to get faster
>>> results. Would they still have to do this using STS? Lastly, we are seeing
>>> zombie YARN jobs left behind even after a user disconnects. Are you seeing
>>> this happen with STS? If not, then this would be even better.
>>>
>>> Thanks for your fast reply.
>>>
>>> Cheers,
>>> Ben
>>>
>>> On Sep 13, 2016, at 3:15 PM, Mich Talebzadeh <mi...@gmail.com>
>>> wrote:
>>>
>>> Hi,
>>>
>>> Spark Thrift server (STS) still uses hive thrift server. If you look at
>>> $SPARK_HOME/sbin/start-thriftserver.sh you will see (mine is Spark 2)
>>>
>>> function usage {
>>>   echo "Usage: ./sbin/start-thriftserver [options] [thrift server
>>> options]"
>>>   pattern="usage"
>>>   *pattern+="\|Spark assembly has been built with Hive"*
>>>   pattern+="\|NOTE: SPARK_PREPEND_CLASSES is set"
>>>   pattern+="\|Spark Command: "
>>>   pattern+="\|======="
>>>   pattern+="\|--help"
>>>
>>>
>>> Indeed when you start STS, you pass hiveconf parameter to it
>>>
>>> ${SPARK_HOME}/sbin/start-thriftserver.sh \
>>>                 --master  \
>>>                 --hiveconf hive.server2.thrift.port=10055 \
>>>
>>> and STS bypasses Spark optimiser and uses Hive optimizer and execution
>>> engine. You will see this in hive.log file
>>>
>>> So I don't think it is going to give you much difference. Unless they
>>> have recently changed the design of STS.
>>>
>>> HTH
>>>
>>>
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>> On 13 September 2016 at 22:32, Benjamin Kim <bb...@gmail.com> wrote:
>>>
>>>> Does anyone have any thoughts about using Spark SQL Thriftserver in
>>>> Spark 1.6.2 instead of HiveServer2? We are considering abandoning
>>>> HiveServer2 for it. Some advice and gotcha’s would be nice to know.
>>>>
>>>> Thanks,
>>>> Ben
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>>
>>>>
>>>
>>>
>>
>
>
> --
> Best Regards,
> Ayan Guha
>

Re: Spark SQL Thriftserver

Posted by ayan guha <gu...@gmail.com>.

Hi

AFAIK STS uses Spark SQL and not Map Reduce. Is that not correct?

Best
Ayan

On Wed, Sep 14, 2016 at 8:51 AM, Mich Talebzadeh <mi...@gmail.com>
wrote:

> STS will rely on Hive execution engine. My Hive uses Spark execution
> engine so STS will pass the SQL to Hive and let it do the work and return
> the result set
>
>  which beeline
> /usr/lib/spark-2.0.0-bin-hadoop2.6/bin/beeline
> ${SPARK_HOME}/bin/beeline -u jdbc:hive2://rhes564:10055 -n hduser -p
> xxxxxxxx
> Connecting to jdbc:hive2://rhes564:10055
> Connected to: Spark SQL (version 2.0.0)
> Driver: Hive JDBC (version 1.2.1.spark2)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 1.2.1.spark2 by Apache Hive
> 0: jdbc:hive2://rhes564:10055>
>
> jdbc:hive2://rhes564:10055> select count(1) from test.prices;
> Ok I did a simple query in STS, You will this in hive.log
>
> 2016-09-13T23:44:50,996 INFO  [pool-4-thread-4]: metastore.HiveMetaStore
> (HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217
> get_database: test
> 2016-09-13T23:44:50,996 INFO  [pool-4-thread-4]: HiveMetaStore.audit
> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
> ip=50.140.197.217       cmd=source:50.140.197.217 get_database: test
> 2016-09-13T23:44:50,998 INFO  [pool-4-thread-4]: metastore.HiveMetaStore
> (HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217 get_table :
> db=test tbl=prices
> 2016-09-13T23:44:50,998 INFO  [pool-4-thread-4]: HiveMetaStore.audit
> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
> ip=50.140.197.217       cmd=source:50.140.197.217 get_table : db=test
> tbl=prices
> 2016-09-13T23:44:51,007 INFO  [pool-4-thread-4]: metastore.HiveMetaStore
> (HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217 get_table :
> db=test tbl=prices
> 2016-09-13T23:44:51,007 INFO  [pool-4-thread-4]: HiveMetaStore.audit
> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
> ip=50.140.197.217       cmd=source:50.140.197.217 get_table : db=test
> tbl=prices
> 2016-09-13T23:44:51,021 INFO  [pool-4-thread-4]: metastore.HiveMetaStore
> (HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217
> get_database: test
> 2016-09-13T23:44:51,021 INFO  [pool-4-thread-4]: HiveMetaStore.audit
> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
> ip=50.140.197.217       cmd=source:50.140.197.217 get_database: test
> 2016-09-13T23:44:51,023 INFO  [pool-4-thread-4]: metastore.HiveMetaStore
> (HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217 get_table :
> db=test tbl=prices
> 2016-09-13T23:44:51,023 INFO  [pool-4-thread-4]: HiveMetaStore.audit
> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
> ip=50.140.197.217       cmd=source:50.140.197.217 get_table : db=test
> tbl=prices
> 2016-09-13T23:44:51,029 INFO  [pool-4-thread-4]: metastore.HiveMetaStore
> (HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217 get_table :
> db=test tbl=prices
> 2016-09-13T23:44:51,029 INFO  [pool-4-thread-4]: HiveMetaStore.audit
> (HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
> ip=50.140.197.217       cmd=source:50.140.197.217 get_table : db=test
> tbl=prices
>
> I think it is a good idea to switch to Spark engine (as opposed to MR). My
> tests proved that Hive on Spark using DAG and in-memory offering runs at
> least by order of magnitude faster compared to map-reduce.
>
> You can either connect to beeline from $HIVE_HOME/... or beeline from
> $SPARK_HOME
>
> HTH
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 13 September 2016 at 23:28, Benjamin Kim <bb...@gmail.com> wrote:
>
>> Mich,
>>
>> It sounds like that there would be no harm in changing then. Are you
>> saying that using STS would still use MapReduce to run the SQL statements?
>> What our users are doing in our CDH 5.7.2 installation is changing the
>> execution engine to Spark when connected to HiveServer2 to get faster
>> results. Would they still have to do this using STS? Lastly, we are seeing
>> zombie YARN jobs left behind even after a user disconnects. Are you seeing
>> this happen with STS? If not, then this would be even better.
>>
>> Thanks for your fast reply.
>>
>> Cheers,
>> Ben
>>
>> On Sep 13, 2016, at 3:15 PM, Mich Talebzadeh <mi...@gmail.com>
>> wrote:
>>
>> Hi,
>>
>> Spark Thrift server (STS) still uses hive thrift server. If you look at
>> $SPARK_HOME/sbin/start-thriftserver.sh you will see (mine is Spark 2)
>>
>> function usage {
>>   echo "Usage: ./sbin/start-thriftserver [options] [thrift server
>> options]"
>>   pattern="usage"
>>   *pattern+="\|Spark assembly has been built with Hive"*
>>   pattern+="\|NOTE: SPARK_PREPEND_CLASSES is set"
>>   pattern+="\|Spark Command: "
>>   pattern+="\|======="
>>   pattern+="\|--help"
>>
>>
>> Indeed when you start STS, you pass hiveconf parameter to it
>>
>> ${SPARK_HOME}/sbin/start-thriftserver.sh \
>>                 --master  \
>>                 --hiveconf hive.server2.thrift.port=10055 \
>>
>> and STS bypasses Spark optimiser and uses Hive optimizer and execution
>> engine. You will see this in hive.log file
>>
>> So I don't think it is going to give you much difference. Unless they
>> have recently changed the design of STS.
>>
>> HTH
>>
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 13 September 2016 at 22:32, Benjamin Kim <bb...@gmail.com> wrote:
>>
>>> Does anyone have any thoughts about using Spark SQL Thriftserver in
>>> Spark 1.6.2 instead of HiveServer2? We are considering abandoning
>>> HiveServer2 for it. Some advice and gotcha’s would be nice to know.
>>>
>>> Thanks,
>>> Ben
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>
>>>
>>
>>
>


-- 
Best Regards,
Ayan Guha

Re: Spark SQL Thriftserver

Posted by Mich Talebzadeh <mi...@gmail.com>.

STS will rely on Hive execution engine. My Hive uses Spark execution engine
so STS will pass the SQL to Hive and let it do the work and return the
result set

 which beeline
/usr/lib/spark-2.0.0-bin-hadoop2.6/bin/beeline
${SPARK_HOME}/bin/beeline -u jdbc:hive2://rhes564:10055 -n hduser -p
xxxxxxxx
Connecting to jdbc:hive2://rhes564:10055
Connected to: Spark SQL (version 2.0.0)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1.spark2 by Apache Hive
0: jdbc:hive2://rhes564:10055>

jdbc:hive2://rhes564:10055> select count(1) from test.prices;
Ok I did a simple query in STS, You will this in hive.log

2016-09-13T23:44:50,996 INFO  [pool-4-thread-4]: metastore.HiveMetaStore
(HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217 get_database:
test
2016-09-13T23:44:50,996 INFO  [pool-4-thread-4]: HiveMetaStore.audit
(HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
ip=50.140.197.217       cmd=source:50.140.197.217 get_database: test
2016-09-13T23:44:50,998 INFO  [pool-4-thread-4]: metastore.HiveMetaStore
(HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217 get_table :
db=test tbl=prices
2016-09-13T23:44:50,998 INFO  [pool-4-thread-4]: HiveMetaStore.audit
(HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
ip=50.140.197.217       cmd=source:50.140.197.217 get_table : db=test
tbl=prices
2016-09-13T23:44:51,007 INFO  [pool-4-thread-4]: metastore.HiveMetaStore
(HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217 get_table :
db=test tbl=prices
2016-09-13T23:44:51,007 INFO  [pool-4-thread-4]: HiveMetaStore.audit
(HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
ip=50.140.197.217       cmd=source:50.140.197.217 get_table : db=test
tbl=prices
2016-09-13T23:44:51,021 INFO  [pool-4-thread-4]: metastore.HiveMetaStore
(HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217 get_database:
test
2016-09-13T23:44:51,021 INFO  [pool-4-thread-4]: HiveMetaStore.audit
(HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
ip=50.140.197.217       cmd=source:50.140.197.217 get_database: test
2016-09-13T23:44:51,023 INFO  [pool-4-thread-4]: metastore.HiveMetaStore
(HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217 get_table :
db=test tbl=prices
2016-09-13T23:44:51,023 INFO  [pool-4-thread-4]: HiveMetaStore.audit
(HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
ip=50.140.197.217       cmd=source:50.140.197.217 get_table : db=test
tbl=prices
2016-09-13T23:44:51,029 INFO  [pool-4-thread-4]: metastore.HiveMetaStore
(HiveMetaStore.java:logInfo(670)) - 4: source:50.140.197.217 get_table :
db=test tbl=prices
2016-09-13T23:44:51,029 INFO  [pool-4-thread-4]: HiveMetaStore.audit
(HiveMetaStore.java:logAuditEvent(280)) - ugi=hduser
ip=50.140.197.217       cmd=source:50.140.197.217 get_table : db=test
tbl=prices

I think it is a good idea to switch to Spark engine (as opposed to MR). My
tests proved that Hive on Spark using DAG and in-memory offering runs at
least by order of magnitude faster compared to map-reduce.

You can either connect to beeline from $HIVE_HOME/... or beeline from
$SPARK_HOME

HTH




Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 13 September 2016 at 23:28, Benjamin Kim <bb...@gmail.com> wrote:

> Mich,
>
> It sounds like that there would be no harm in changing then. Are you
> saying that using STS would still use MapReduce to run the SQL statements?
> What our users are doing in our CDH 5.7.2 installation is changing the
> execution engine to Spark when connected to HiveServer2 to get faster
> results. Would they still have to do this using STS? Lastly, we are seeing
> zombie YARN jobs left behind even after a user disconnects. Are you seeing
> this happen with STS? If not, then this would be even better.
>
> Thanks for your fast reply.
>
> Cheers,
> Ben
>
> On Sep 13, 2016, at 3:15 PM, Mich Talebzadeh <mi...@gmail.com>
> wrote:
>
> Hi,
>
> Spark Thrift server (STS) still uses hive thrift server. If you look at
> $SPARK_HOME/sbin/start-thriftserver.sh you will see (mine is Spark 2)
>
> function usage {
>   echo "Usage: ./sbin/start-thriftserver [options] [thrift server options]"
>   pattern="usage"
>   *pattern+="\|Spark assembly has been built with Hive"*
>   pattern+="\|NOTE: SPARK_PREPEND_CLASSES is set"
>   pattern+="\|Spark Command: "
>   pattern+="\|======="
>   pattern+="\|--help"
>
>
> Indeed when you start STS, you pass hiveconf parameter to it
>
> ${SPARK_HOME}/sbin/start-thriftserver.sh \
>                 --master  \
>                 --hiveconf hive.server2.thrift.port=10055 \
>
> and STS bypasses Spark optimiser and uses Hive optimizer and execution
> engine. You will see this in hive.log file
>
> So I don't think it is going to give you much difference. Unless they have
> recently changed the design of STS.
>
> HTH
>
>
>
>
> Dr Mich Talebzadeh
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
> http://talebzadehmich.wordpress.com
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 13 September 2016 at 22:32, Benjamin Kim <bb...@gmail.com> wrote:
>
>> Does anyone have any thoughts about using Spark SQL Thriftserver in Spark
>> 1.6.2 instead of HiveServer2? We are considering abandoning HiveServer2 for
>> it. Some advice and gotcha’s would be nice to know.
>>
>> Thanks,
>> Ben
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>
>
>

Re: Spark SQL Thriftserver

Posted by Benjamin Kim <bb...@gmail.com>.

Mich,

It sounds like that there would be no harm in changing then. Are you saying that using STS would still use MapReduce to run the SQL statements? What our users are doing in our CDH 5.7.2 installation is changing the execution engine to Spark when connected to HiveServer2 to get faster results. Would they still have to do this using STS? Lastly, we are seeing zombie YARN jobs left behind even after a user disconnects. Are you seeing this happen with STS? If not, then this would be even better.

Thanks for your fast reply.

Cheers,
Ben

> On Sep 13, 2016, at 3:15 PM, Mich Talebzadeh <mi...@gmail.com> wrote:
> 
> Hi,
> 
> Spark Thrift server (STS) still uses hive thrift server. If you look at $SPARK_HOME/sbin/start-thriftserver.sh you will see (mine is Spark 2)
> 
> function usage {
>   echo "Usage: ./sbin/start-thriftserver [options] [thrift server options]"
>   pattern="usage"
>   pattern+="\|Spark assembly has been built with Hive"
>   pattern+="\|NOTE: SPARK_PREPEND_CLASSES is set"
>   pattern+="\|Spark Command: "
>   pattern+="\|======="
>   pattern+="\|--help"
> 
> 
> Indeed when you start STS, you pass hiveconf parameter to it
> 
> ${SPARK_HOME}/sbin/start-thriftserver.sh \
>                 --master  \
>                 --hiveconf hive.server2.thrift.port=10055 \
> 
> and STS bypasses Spark optimiser and uses Hive optimizer and execution engine. You will see this in hive.log file
> 
> So I don't think it is going to give you much difference. Unless they have recently changed the design of STS.
> 
> HTH
> 
> 
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
> 
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.
>  
> 
> On 13 September 2016 at 22:32, Benjamin Kim <bbuild11@gmail.com <ma...@gmail.com>> wrote:
> Does anyone have any thoughts about using Spark SQL Thriftserver in Spark 1.6.2 instead of HiveServer2? We are considering abandoning HiveServer2 for it. Some advice and gotcha’s would be nice to know.
> 
> Thanks,
> Ben
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org <ma...@spark.apache.org>
> 
>

Re: Spark SQL Thriftserver

Posted by Mich Talebzadeh <mi...@gmail.com>.

Hi,

Spark Thrift server (STS) still uses hive thrift server. If you look at
$SPARK_HOME/sbin/start-thriftserver.sh you will see (mine is Spark 2)

function usage {
  echo "Usage: ./sbin/start-thriftserver [options] [thrift server options]"
  pattern="usage"
  *pattern+="\|Spark assembly has been built with Hive"*
  pattern+="\|NOTE: SPARK_PREPEND_CLASSES is set"
  pattern+="\|Spark Command: "
  pattern+="\|======="
  pattern+="\|--help"

Indeed when you start STS, you pass hiveconf parameter to it

${SPARK_HOME}/sbin/start-thriftserver.sh \
                --master  \
                --hiveconf hive.server2.thrift.port=10055 \

and STS bypasses Spark optimiser and uses Hive optimizer and execution
engine. You will see this in hive.log file

So I don't think it is going to give you much difference. Unless they have
recently changed the design of STS.

HTH

Dr Mich Talebzadeh

LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*

http://talebzadehmich.wordpress.com

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On 13 September 2016 at 22:32, Benjamin Kim <bb...@gmail.com> wrote:

> Does anyone have any thoughts about using Spark SQL Thriftserver in Spark
> 1.6.2 instead of HiveServer2? We are considering abandoning HiveServer2 for
> it. Some advice and gotcha’s would be nice to know.
>
> Thanks,
> Ben
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>