You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Tao Li <li...@apache.org> on 2021/08/27 01:32:21 UTC

How to improve the concurrent query performance of spark SQL query

In the high concurrency scenario, the query performance of spark SQL is limited by namenode and hive Metastore. There are some caches in the code, but the effect is limited. Do we have a practical and effective way to solve the time-consuming problem of driver in concurrent query?

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: How to improve the concurrent query performance of spark SQL query

Posted by Mich Talebzadeh <mi...@gmail.com>.

There are many ways of interacting with Hive DW from Spark.

You can either use the API from Spark to Hive native or you can use JDBC
connection (local or remote spark).

What is the reference to the driver in this context? Bottom line using
concurrent queries, you will have to go through Hive and that is where as
you pointed out, you may have concurrency issues. Spark IMO does not play
such a significant role here. Your concurrency will rise from the way Hive
is configured to handle multiple threads. If Hive metastore is on Oracle
you will have or expect v.good performance. On the other hand if you use
some MySql etc, then you will have bottleneck on the hive side.

HTH

   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On Fri, 27 Aug 2021 at 02:32, Tao Li <li...@apache.org> wrote:

> In the high concurrency scenario, the query performance of spark SQL is
> limited by namenode and hive Metastore. There are some caches in the code,
> but the effect is limited. Do we have a practical and effective way to
> solve the time-consuming problem of driver in concurrent query?
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>