You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mich Talebzadeh <mi...@gmail.com> on 2016/07/13 07:33:43 UTC

Spark Thrift Server performance

Hi,

I need some feedback on the performance of the Spark Thrift Server (STS)

As far I can ascertain one can start STS passing the usual spark parameters

${SPARK_HOME}/sbin/start-thriftserver.sh \
                --master spark://50.140.197.217:7077 \
                --hiveconf hive.server2.thrift.port=10055 \
                --packages <PACKAGES> \
                --driver-memory 2G \
                --num-executors 2 \
                --executor-memory 2G \
                --conf "spark.scheduler.mode=FAIR" \
                --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps" \
                --jars <JAR_LIST> \
                --conf "spark.ui.port=12345"


  And accessing it via beeline JDBC client

beeline -u jdbc:hive2://rhes564:10055 -n hduser -p

Now the questions I have


   1. What is the limit on the number of users accessing the thrift server.
   2. Clearly the thrift server can start with resource configuration. In a
   simple way does STS act as a gateway to Spark (meaning Spark apps can use
   their own resources) or one is limited to resource that STS offers?
   3. Can one start multiple thrift servers

As far as I can see STS is equivalent to Spark SQL accessing Hive DW.
Indeed this is what it says:

Connecting to jdbc:hive2://rhes564:10055
Connected to: Spark SQL (version 1.6.1)
Driver: Spark Project Core (version 1.6.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.6.1 by Apache Hive
0: jdbc:hive2://rhes564:10055>

Thanks



Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Re: Spark Thrift Server performance

Posted by Mich Talebzadeh <mi...@gmail.com>.
Thanks guys

Any idea on this

What is the limit on the number of users accessing the thrift server
concurrently. Say using Yarn, will Yarn control apps accessing the thrift
server or users each armed with beeline connect to thrift server. Say my
STS has this conf below

                --driver-memory 8G \
                --num-executors 2 \
                --executor-memory 8G \
                --executor-cores 4 \

I may be not making sense but if the sigma of resources for the number of
"concurrent" connections exceed above STS limits, I gather the new
connections will hang? This assumes that we are just running a single STS
in a single node.

Cheers






Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 13 July 2016 at 17:09, ayan guha <gu...@gmail.com> wrote:

> Not really, that is not the primary intention. Our main goal is poor man's
> high availability (as STS does not provide HA mechanism like HS2) :).
> Additionally, we have made STS part of Ambari AUTO_START group, so Ambari
> brings up STS if it goes down for some intermittent reason.
>
>
>
> On Thu, Jul 14, 2016 at 1:38 AM, Michael Segel <ms...@hotmail.com>
> wrote:
>
>> Hey, silly question?
>>
>> If you’re running a load balancer, are you trying to reuse the RDDs
>> between jobs?
>>
>> TIA
>> -Mike
>>
>> On Jul 13, 2016, at 9:08 AM, ayan guha <gu...@gmail.com> wrote:
>>
>> My 2 cents:
>>
>> Yes, we are running multiple STS (we are running on different nodes, but
>> you can run on same node, different ports). Using Ambari, it is really
>> convenient to manage.
>>
>> We have set up a nginx load balancer as well pointing to both services
>> and all our external BI tools connect to the load balancer.
>>
>> STS works as an YARN Client application, where STS is the driver.
>>
>>
>>
>> On Wed, Jul 13, 2016 at 5:33 PM, Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I need some feedback on the performance of the Spark Thrift Server (STS)
>>>
>>> As far I can ascertain one can start STS passing the usual spark
>>> parameters
>>>
>>> ${SPARK_HOME}/sbin/start-thriftserver.sh \
>>>                 --master spark://50.140.197.217:7077 \
>>>                 --hiveconf hive.server2.thrift.port=10055 \
>>>                 --packages <PACKAGES> \
>>>                 --driver-memory 2G \
>>>                 --num-executors 2 \
>>>                 --executor-memory 2G \
>>>                 --conf "spark.scheduler.mode=FAIR" \
>>>                 --conf
>>> "spark.executor.extraJavaOptions=-XX:+PrintGCDetails
>>> -XX:+PrintGCTimeStamps" \
>>>                 --jars <JAR_LIST> \
>>>                 --conf "spark.ui.port=12345"
>>>
>>>
>>>   And accessing it via beeline JDBC client
>>>
>>> beeline -u jdbc:hive2://rhes564:10055 -n hduser -p
>>>
>>> Now the questions I have
>>>
>>>
>>>    1. What is the limit on the number of users accessing the thrift
>>>    server.
>>>    2. Clearly the thrift server can start with resource configuration.
>>>    In a simple way does STS act as a gateway to Spark (meaning Spark apps can
>>>    use their own resources) or one is limited to resource that STS offers?
>>>    3. Can one start multiple thrift servers
>>>
>>> As far as I can see STS is equivalent to Spark SQL accessing Hive DW.
>>> Indeed this is what it says:
>>>
>>> Connecting to jdbc:hive2://rhes564:10055
>>> Connected to: Spark SQL (version 1.6.1)
>>> Driver: Spark Project Core (version 1.6.1)
>>> Transaction isolation: TRANSACTION_REPEATABLE_READ
>>> Beeline version 1.6.1 by Apache Hive
>>> 0: jdbc:hive2://rhes564:10055>
>>>
>>> Thanks
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>>
>>
>
>
> --
> Best Regards,
> Ayan Guha
>

Re: Spark Thrift Server performance

Posted by ayan guha <gu...@gmail.com>.
Not really, that is not the primary intention. Our main goal is poor man's
high availability (as STS does not provide HA mechanism like HS2) :).
Additionally, we have made STS part of Ambari AUTO_START group, so Ambari
brings up STS if it goes down for some intermittent reason.



On Thu, Jul 14, 2016 at 1:38 AM, Michael Segel <ms...@hotmail.com>
wrote:

> Hey, silly question?
>
> If you’re running a load balancer, are you trying to reuse the RDDs
> between jobs?
>
> TIA
> -Mike
>
> On Jul 13, 2016, at 9:08 AM, ayan guha <gu...@gmail.com> wrote:
>
> My 2 cents:
>
> Yes, we are running multiple STS (we are running on different nodes, but
> you can run on same node, different ports). Using Ambari, it is really
> convenient to manage.
>
> We have set up a nginx load balancer as well pointing to both services and
> all our external BI tools connect to the load balancer.
>
> STS works as an YARN Client application, where STS is the driver.
>
>
>
> On Wed, Jul 13, 2016 at 5:33 PM, Mich Talebzadeh <
> mich.talebzadeh@gmail.com> wrote:
>
>> Hi,
>>
>> I need some feedback on the performance of the Spark Thrift Server (STS)
>>
>> As far I can ascertain one can start STS passing the usual spark
>> parameters
>>
>> ${SPARK_HOME}/sbin/start-thriftserver.sh \
>>                 --master spark://50.140.197.217:7077 \
>>                 --hiveconf hive.server2.thrift.port=10055 \
>>                 --packages <PACKAGES> \
>>                 --driver-memory 2G \
>>                 --num-executors 2 \
>>                 --executor-memory 2G \
>>                 --conf "spark.scheduler.mode=FAIR" \
>>                 --conf
>> "spark.executor.extraJavaOptions=-XX:+PrintGCDetails
>> -XX:+PrintGCTimeStamps" \
>>                 --jars <JAR_LIST> \
>>                 --conf "spark.ui.port=12345"
>>
>>
>>   And accessing it via beeline JDBC client
>>
>> beeline -u jdbc:hive2://rhes564:10055 -n hduser -p
>>
>> Now the questions I have
>>
>>
>>    1. What is the limit on the number of users accessing the thrift
>>    server.
>>    2. Clearly the thrift server can start with resource configuration.
>>    In a simple way does STS act as a gateway to Spark (meaning Spark apps can
>>    use their own resources) or one is limited to resource that STS offers?
>>    3. Can one start multiple thrift servers
>>
>> As far as I can see STS is equivalent to Spark SQL accessing Hive DW.
>> Indeed this is what it says:
>>
>> Connecting to jdbc:hive2://rhes564:10055
>> Connected to: Spark SQL (version 1.6.1)
>> Driver: Spark Project Core (version 1.6.1)
>> Transaction isolation: TRANSACTION_REPEATABLE_READ
>> Beeline version 1.6.1 by Apache Hive
>> 0: jdbc:hive2://rhes564:10055>
>>
>> Thanks
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>
>
>
> --
> Best Regards,
> Ayan Guha
>
>
>


-- 
Best Regards,
Ayan Guha

Re: Spark Thrift Server performance

Posted by Michael Segel <ms...@hotmail.com>.
Hey, silly question? 

If you’re running a load balancer, are you trying to reuse the RDDs between jobs? 

TIA
-Mike

> On Jul 13, 2016, at 9:08 AM, ayan guha <guha.ayan@gmail.com <ma...@gmail.com>> wrote:
> 
> My 2 cents:
> 
> Yes, we are running multiple STS (we are running on different nodes, but you can run on same node, different ports). Using Ambari, it is really convenient to manage. 
> 
> We have set up a nginx load balancer as well pointing to both services and all our external BI tools connect to the load balancer. 
> 
> STS works as an YARN Client application, where STS is the driver. 
> 
> 
> 
> On Wed, Jul 13, 2016 at 5:33 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com <ma...@gmail.com>> wrote:
> Hi,
> 
> I need some feedback on the performance of the Spark Thrift Server (STS) 
> 
> As far I can ascertain one can start STS passing the usual spark parameters
> 
> ${SPARK_HOME}/sbin/start-thriftserver.sh \
>                 --master spark://50.140.197.217:7077 <http://50.140.197.217:7077/> \
>                 --hiveconf hive.server2.thrift.port=10055 \
>                 --packages <PACKAGES> \
>                 --driver-memory 2G \
>                 --num-executors 2 \
>                 --executor-memory 2G \
>                 --conf "spark.scheduler.mode=FAIR" \
>                 --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps" \
>                 --jars <JAR_LIST> \
>                 --conf "spark.ui.port=12345" 
> 
> 
>   And accessing it via beeline JDBC client
> 
> beeline -u jdbc:hive2://rhes564:10055 -n hduser -p
> 
> Now the questions I have
> 
> What is the limit on the number of users accessing the thrift server.
> Clearly the thrift server can start with resource configuration. In a simple way does STS act as a gateway to Spark (meaning Spark apps can use their own resources) or one is limited to resource that STS offers?
> Can one start multiple thrift servers
> As far as I can see STS is equivalent to Spark SQL accessing Hive DW. Indeed this is what it says:
> 
> Connecting to jdbc:hive2://rhes564:10055
> Connected to: Spark SQL (version 1.6.1)
> Driver: Spark Project Core (version 1.6.1)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 1.6.1 by Apache Hive
> 0: jdbc:hive2://rhes564:10055>
> 
> Thanks
> 
>  
> Dr Mich Talebzadeh
>  
> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
> 
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.
>  
> 
> 
> 
> -- 
> Best Regards,
> Ayan Guha


Re: Spark Thrift Server performance

Posted by ayan guha <gu...@gmail.com>.
My 2 cents:

Yes, we are running multiple STS (we are running on different nodes, but
you can run on same node, different ports). Using Ambari, it is really
convenient to manage.

We have set up a nginx load balancer as well pointing to both services and
all our external BI tools connect to the load balancer.

STS works as an YARN Client application, where STS is the driver.



On Wed, Jul 13, 2016 at 5:33 PM, Mich Talebzadeh <mi...@gmail.com>
wrote:

> Hi,
>
> I need some feedback on the performance of the Spark Thrift Server (STS)
>
> As far I can ascertain one can start STS passing the usual spark parameters
>
> ${SPARK_HOME}/sbin/start-thriftserver.sh \
>                 --master spark://50.140.197.217:7077 \
>                 --hiveconf hive.server2.thrift.port=10055 \
>                 --packages <PACKAGES> \
>                 --driver-memory 2G \
>                 --num-executors 2 \
>                 --executor-memory 2G \
>                 --conf "spark.scheduler.mode=FAIR" \
>                 --conf
> "spark.executor.extraJavaOptions=-XX:+PrintGCDetails
> -XX:+PrintGCTimeStamps" \
>                 --jars <JAR_LIST> \
>                 --conf "spark.ui.port=12345"
>
>
>   And accessing it via beeline JDBC client
>
> beeline -u jdbc:hive2://rhes564:10055 -n hduser -p
>
> Now the questions I have
>
>
>    1. What is the limit on the number of users accessing the thrift
>    server.
>    2. Clearly the thrift server can start with resource configuration. In
>    a simple way does STS act as a gateway to Spark (meaning Spark apps can use
>    their own resources) or one is limited to resource that STS offers?
>    3. Can one start multiple thrift servers
>
> As far as I can see STS is equivalent to Spark SQL accessing Hive DW.
> Indeed this is what it says:
>
> Connecting to jdbc:hive2://rhes564:10055
> Connected to: Spark SQL (version 1.6.1)
> Driver: Spark Project Core (version 1.6.1)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> Beeline version 1.6.1 by Apache Hive
> 0: jdbc:hive2://rhes564:10055>
>
> Thanks
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>



-- 
Best Regards,
Ayan Guha