You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by "weicm@foxmail.com" <we...@foxmail.com> on 2022/12/28 03:32:52 UTC

Spark thrift driver memory leak

Spark Thrift Server memory overflow after running for a period of time, thanks for any help!


ENV
Spark SQL (version 3.2.1)
Driver: Hive JDBC (version 2.3.9)
Hadoop 3.1.0.3.0.0.0-1634

Start Command
/usr/hdp/3.0.0.0-1634/spark-3.2.1-bin-hadoop3.2/sbin/start-thriftserver.sh --properties-file /usr/hdp/3.0.0.0-1634/spark-3.2.1-bin-hadoop3.2/conf/spark-thrift-sparkconf.conf

See attachment for configuration: spark-thrift-sparkconf.conf

weicm@foxmail.com

Re: Re: Spark thrift driver memory leak

Posted by "weicm@foxmail.com" <we...@foxmail.com>.

Yes, you are right that Hive has not been updated.
But from spark 3.2.1 to 3.2.3 there are 161 JIRA tickets that have been
fixed.
Which means that if you upgrade to 3.2.3, you can eliminate 161 other
things that might cause this error.



tor. 29. des. 2022 kl. 07:09 skrev weicm@foxmail.com <we...@foxmail.com>:

> Spark 3.2.3 nothing change about thrift.
>
>
> ------------------------------
> weicm@foxmail.com
>
>
> *From:* Bjørn Jørgensen <bj...@gmail.com>
> *Date:* 2022-12-28 22:06
> *To:* Gourav Sengupta <go...@gmail.com>
> *CC:* weicm@foxmail.com; user <us...@spark.apache.org>
> *Subject:* Re: Re: Spark thrift driver memory leak
> The first thing I will suggest is to upgrade from spark 3.2.1 to 3.2.3
>
> Apache spark has released 3.2.3
> <https://spark.apache.org/news/spark-3-2-3-released.html> which fixes a
> lot of things.
>
> ons. 28. des. 2022 kl. 09:33 skrev Gourav Sengupta <
> gourav.sengupta@gmail.com>:
>
>> Hi,
>>
>> Look at the number of variables that you have look at and tweak - this is
>> rocket science, I think that even rockets have less number of variables
>> available on their dashboard than it requires for queries to run in an
>> optimal way in SPARK in a sustained manner.
>>
>> this is the classical problem in the world now where people are
>> losing jobs because
>> > the question we are trying to answer is "how to make a rocket science
>> like SPARK work?"
>> > rather than asking "how to run the same queries run at fraction of cost
>> throughout the lifecycle of the data?"
>>
>> The only people who are gaining from the complications of running a
>> simple query are those:
>> > introducing those complications in infrastructure through completely
>> and utterly unnecessary containerisations, and
>> > the companies running out of SPARK
>> everyone else is certainly not winning if the same solutions can be run
>> at a fraction of the cost in a market.
>>
>>
>>
>> Regards,
>> Gourav Sengupta
>>
>> On Wed, Dec 28, 2022 at 8:16 AM weicm@foxmail.com <we...@foxmail.com>
>> wrote:
>>
>>> Thank you very much for your suggestion, but we are using a private
>>> on-premises cluster in Spark on Hadoop mode, so I have to continue looking
>>> for a solution.
>>>
>>> ------------------------------
>>> weicm@foxmail.com
>>>
>>>
>>> *From:* Gourav Sengupta <go...@gmail.com>
>>> *Date:* 2022-12-28 15:06
>>> *To:* weicm@foxmail.com
>>> *CC:* user <us...@spark.apache.org>
>>> *Subject:* Re: Spark thrift driver memory leak
>>> Hi,
>>>
>>> have you tried redshift or snowflake?
>>>
>>> SPARK is too complicated and too much of rocket science to manage simple
>>> operations.
>>>
>>> Also if you are in AWS try to use EMR based Presto or Trino, they can
>>> aggregate massive data at 1000x the cost.
>>>
>>>
>>> Regards,
>>> Gourav Sengupta
>>>
>>> On Wed, Dec 28, 2022 at 4:10 AM weicm@foxmail.com <we...@foxmail.com>
>>> wrote:
>>>
>>>> Spark Thrift Server memory overflow after running for a period of time,
>>>> thanks for any help!
>>>>
>>>>
>>>> *ENV*
>>>> Spark SQL (version 3.2.1)
>>>> Driver: Hive JDBC (version 2.3.9)
>>>> Hadoop 3.1.0.3.0.0.0-1634
>>>>
>>>> *Start Command*
>>>> /usr/hdp/3.0.0.0-1634/spark-3.2.1-bin-hadoop3.2/sbin/start-thriftserver.sh
>>>> --properties-file
>>>> /usr/hdp/3.0.0.0-1634/spark-3.2.1-bin-hadoop3.2/conf/spark-thrift-sparkconf.conf
>>>>
>>>> *See attachment for configuration: spark-thrift-sparkconf.conf*
>>>>
>>>> weicm@foxmail.com
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>>
>>>
>
> --
> Bjørn Jørgensen
> Vestre Aspehaug 4, 6010 Ålesund
> Norge
>
> +47 480 94 297
>
>

-- 
Bjørn Jørgensen
Vestre Aspehaug 4, 6010 Ålesund
Norge

+47 480 94 297

Re: Re: Spark thrift driver memory leak

Posted by Bjørn Jørgensen <bj...@gmail.com>.

Yes, you are right that Hive has not been updated.
But from spark 3.2.1 to 3.2.3 there are 161 JIRA tickets that have been
fixed.
Which means that if you upgrade to 3.2.3, you can eliminate 161 other
things that might cause this error.



tor. 29. des. 2022 kl. 07:09 skrev weicm@foxmail.com <we...@foxmail.com>:

> Spark 3.2.3 nothing change about thrift.
>
>
> ------------------------------
> weicm@foxmail.com
>
>
> *From:* Bjørn Jørgensen <bj...@gmail.com>
> *Date:* 2022-12-28 22:06
> *To:* Gourav Sengupta <go...@gmail.com>
> *CC:* weicm@foxmail.com; user <us...@spark.apache.org>
> *Subject:* Re: Re: Spark thrift driver memory leak
> The first thing I will suggest is to upgrade from spark 3.2.1 to 3.2.3
>
> Apache spark has released 3.2.3
> <https://spark.apache.org/news/spark-3-2-3-released.html> which fixes a
> lot of things.
>
> ons. 28. des. 2022 kl. 09:33 skrev Gourav Sengupta <
> gourav.sengupta@gmail.com>:
>
>> Hi,
>>
>> Look at the number of variables that you have look at and tweak - this is
>> rocket science, I think that even rockets have less number of variables
>> available on their dashboard than it requires for queries to run in an
>> optimal way in SPARK in a sustained manner.
>>
>> this is the classical problem in the world now where people are
>> losing jobs because
>> > the question we are trying to answer is "how to make a rocket science
>> like SPARK work?"
>> > rather than asking "how to run the same queries run at fraction of cost
>> throughout the lifecycle of the data?"
>>
>> The only people who are gaining from the complications of running a
>> simple query are those:
>> > introducing those complications in infrastructure through completely
>> and utterly unnecessary containerisations, and
>> > the companies running out of SPARK
>> everyone else is certainly not winning if the same solutions can be run
>> at a fraction of the cost in a market.
>>
>>
>>
>> Regards,
>> Gourav Sengupta
>>
>> On Wed, Dec 28, 2022 at 8:16 AM weicm@foxmail.com <we...@foxmail.com>
>> wrote:
>>
>>> Thank you very much for your suggestion, but we are using a private
>>> on-premises cluster in Spark on Hadoop mode, so I have to continue looking
>>> for a solution.
>>>
>>> ------------------------------
>>> weicm@foxmail.com
>>>
>>>
>>> *From:* Gourav Sengupta <go...@gmail.com>
>>> *Date:* 2022-12-28 15:06
>>> *To:* weicm@foxmail.com
>>> *CC:* user <us...@spark.apache.org>
>>> *Subject:* Re: Spark thrift driver memory leak
>>> Hi,
>>>
>>> have you tried redshift or snowflake?
>>>
>>> SPARK is too complicated and too much of rocket science to manage simple
>>> operations.
>>>
>>> Also if you are in AWS try to use EMR based Presto or Trino, they can
>>> aggregate massive data at 1000x the cost.
>>>
>>>
>>> Regards,
>>> Gourav Sengupta
>>>
>>> On Wed, Dec 28, 2022 at 4:10 AM weicm@foxmail.com <we...@foxmail.com>
>>> wrote:
>>>
>>>> Spark Thrift Server memory overflow after running for a period of time,
>>>> thanks for any help!
>>>>
>>>>
>>>> *ENV*
>>>> Spark SQL (version 3.2.1)
>>>> Driver: Hive JDBC (version 2.3.9)
>>>> Hadoop 3.1.0.3.0.0.0-1634
>>>>
>>>> *Start Command*
>>>> /usr/hdp/3.0.0.0-1634/spark-3.2.1-bin-hadoop3.2/sbin/start-thriftserver.sh
>>>> --properties-file
>>>> /usr/hdp/3.0.0.0-1634/spark-3.2.1-bin-hadoop3.2/conf/spark-thrift-sparkconf.conf
>>>>
>>>> *See attachment for configuration: spark-thrift-sparkconf.conf*
>>>>
>>>> weicm@foxmail.com
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>>
>>>
>
> --
> Bjørn Jørgensen
> Vestre Aspehaug 4, 6010 Ålesund
> Norge
>
> +47 480 94 297
>
>

-- 
Bjørn Jørgensen
Vestre Aspehaug 4, 6010 Ålesund
Norge

+47 480 94 297

Re: Re: Spark thrift driver memory leak

Posted by "weicm@foxmail.com" <we...@foxmail.com>.

Spark 3.2.3 nothing change about thrift.

weicm@foxmail.com

From: Bjørn Jørgensen
Date: 2022-12-28 22:06
To: Gourav Sengupta
CC: weicm@foxmail.com; user
Subject: Re: Re: Spark thrift driver memory leak
The first thing I will suggest is to upgrade from spark 3.2.1 to 3.2.3 

Apache spark has released 3.2.3 which fixes a lot of things. 

ons. 28. des. 2022 kl. 09:33 skrev Gourav Sengupta <go...@gmail.com>:
Hi,

Look at the number of variables that you have look at and tweak - this is rocket science, I think that even rockets have less number of variables available on their dashboard than it requires for queries to run in an optimal way in SPARK in a sustained manner.

this is the classical problem in the world now where people are losing jobs because 
> the question we are trying to answer is "how to make a rocket science like SPARK work?" 
> rather than asking "how to run the same queries run at fraction of cost throughout the lifecycle of the data?"

The only people who are gaining from the complications of running a simple query are those:
> introducing those complications in infrastructure through completely and utterly unnecessary containerisations, and 
> the companies running out of SPARK
everyone else is certainly not winning if the same solutions can be run at a fraction of the cost in a market.

Regards,
Gourav Sengupta

On Wed, Dec 28, 2022 at 8:16 AM weicm@foxmail.com <we...@foxmail.com> wrote:
Thank you very much for your suggestion, but we are using a private on-premises cluster in Spark on Hadoop mode, so I have to continue looking for a solution.

weicm@foxmail.com

From: Gourav Sengupta
Date: 2022-12-28 15:06
To: weicm@foxmail.com
CC: user
Subject: Re: Spark thrift driver memory leak
Hi,

have you tried redshift or snowflake? 

SPARK is too complicated and too much of rocket science to manage simple operations.

Also if you are in AWS try to use EMR based Presto or Trino, they can aggregate massive data at 1000x the cost.

Regards,
Gourav Sengupta

On Wed, Dec 28, 2022 at 4:10 AM weicm@foxmail.com <we...@foxmail.com> wrote:
Spark Thrift Server memory overflow after running for a period of time, thanks for any help!

ENV
Spark SQL (version 3.2.1)
Driver: Hive JDBC (version 2.3.9)
Hadoop 3.1.0.3.0.0.0-1634

Start Command
/usr/hdp/3.0.0.0-1634/spark-3.2.1-bin-hadoop3.2/sbin/start-thriftserver.sh --properties-file /usr/hdp/3.0.0.0-1634/spark-3.2.1-bin-hadoop3.2/conf/spark-thrift-sparkconf.conf

See attachment for configuration: spark-thrift-sparkconf.conf

weicm@foxmail.com

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

-- 
Bjørn Jørgensen 
Vestre Aspehaug 4, 6010 Ålesund 
Norge

+47 480 94 297

Re: Spark thrift driver memory leak

Posted by Cheng Pan <ch...@apache.org>.

The object reference shows that SparkExecuteStatementOperation instances
leaked, it indicates those operations are not closed properly. How do you
access the STS? Use HiveJDBC or thrift client directly?

For JDBC client, please check if you close the statement properly; for
thrift client, please check if you close the opHandle after consuming
result set.

BTW, I recommend to use Apache Kyuubi[1] as a drop-in replacement of STS,
basically Kyuubi is a super-STS,

STS:
Thrift Client => Thrift Server + Driver(client deploy mode only)

Kyuubi:
Thrift Client => Thrift Server + RPC Client  => RPC Server + Driver(all
deploy modes supported by Spark)

As you can see, the Spark Driver is separated from the Thrift
Server process, which makes Thrift server process more light and robust,
and then,

   - Thrift Server could manage the lifecycle of Spark Driver, e.g. lazy
   creating Driver, releasing idled Spark Driver to save resources.
   - Thrift Server could route queries to Spark Driver based-on different
   strategies[2], e.g. creating new Spark Driver to each queries; one user
   shares a Spark Driver; users in same group share a Spark Driver; all
   queries shares a Spark Driver(just same as STS)


[1] https://kyuubi.apache.org/
[2] https://kyuubi.apache.org/docs/latest/deployment/engine_share_level.html

Thanks,
Cheng Pan


On Dec 28, 2022 at 22:06:07, Bjørn Jørgensen <bj...@gmail.com>
wrote:

> The first thing I will suggest is to upgrade from spark 3.2.1 to 3.2.3
>
> Apache spark has released 3.2.3
> <https://spark.apache.org/news/spark-3-2-3-released.html> which fixes a
> lot of things.
>
> ons. 28. des. 2022 kl. 09:33 skrev Gourav Sengupta <
> gourav.sengupta@gmail.com>:
>
>> Hi,
>>
>> Look at the number of variables that you have look at and tweak - this is
>> rocket science, I think that even rockets have less number of variables
>> available on their dashboard than it requires for queries to run in an
>> optimal way in SPARK in a sustained manner.
>>
>> this is the classical problem in the world now where people are
>> losing jobs because
>> > the question we are trying to answer is "how to make a rocket science
>> like SPARK work?"
>> > rather than asking "how to run the same queries run at fraction of cost
>> throughout the lifecycle of the data?"
>>
>> The only people who are gaining from the complications of running a
>> simple query are those:
>> > introducing those complications in infrastructure through completely
>> and utterly unnecessary containerisations, and
>> > the companies running out of SPARK
>> everyone else is certainly not winning if the same solutions can be run
>> at a fraction of the cost in a market.
>>
>>
>>
>> Regards,
>> Gourav Sengupta
>>
>> On Wed, Dec 28, 2022 at 8:16 AM weicm@foxmail.com <we...@foxmail.com>
>> wrote:
>>
>>> Thank you very much for your suggestion, but we are using a private
>>> on-premises cluster in Spark on Hadoop mode, so I have to continue looking
>>> for a solution.
>>>
>>> ------------------------------
>>> weicm@foxmail.com
>>>
>>>
>>> *From:* Gourav Sengupta <go...@gmail.com>
>>> *Date:* 2022-12-28 15:06
>>> *To:* weicm@foxmail.com
>>> *CC:* user <us...@spark.apache.org>
>>> *Subject:* Re: Spark thrift driver memory leak
>>> Hi,
>>>
>>> have you tried redshift or snowflake?
>>>
>>> SPARK is too complicated and too much of rocket science to manage simple
>>> operations.
>>>
>>> Also if you are in AWS try to use EMR based Presto or Trino, they can
>>> aggregate massive data at 1000x the cost.
>>>
>>>
>>> Regards,
>>> Gourav Sengupta
>>>
>>> On Wed, Dec 28, 2022 at 4:10 AM weicm@foxmail.com <we...@foxmail.com>
>>> wrote:
>>>
>>>> Spark Thrift Server memory overflow after running for a period of time,
>>>> thanks for any help!
>>>>
>>>>
>>>> *ENV*
>>>> Spark SQL (version 3.2.1)
>>>> Driver: Hive JDBC (version 2.3.9)
>>>> Hadoop 3.1.0.3.0.0.0-1634
>>>>
>>>> *Start Command*
>>>> /usr/hdp/3.0.0.0-1634/spark-3.2.1-bin-hadoop3.2/sbin/start-thriftserver.sh
>>>> --properties-file
>>>> /usr/hdp/3.0.0.0-1634/spark-3.2.1-bin-hadoop3.2/conf/spark-thrift-sparkconf.conf
>>>>
>>>> *See attachment for configuration: spark-thrift-sparkconf.conf*
>>>>
>>>> weicm@foxmail.com
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>>
>>>
>
> --
> Bjørn Jørgensen
> Vestre Aspehaug 4, 6010 Ålesund
> Norge
>
> +47 480 94 297
>

Re: Re: Spark thrift driver memory leak

Posted by Bjørn Jørgensen <bj...@gmail.com>.

The first thing I will suggest is to upgrade from spark 3.2.1 to 3.2.3

Apache spark has released 3.2.3
<https://spark.apache.org/news/spark-3-2-3-released.html> which fixes a lot
of things.

ons. 28. des. 2022 kl. 09:33 skrev Gourav Sengupta <
gourav.sengupta@gmail.com>:

> Hi,
>
> Look at the number of variables that you have look at and tweak - this is
> rocket science, I think that even rockets have less number of variables
> available on their dashboard than it requires for queries to run in an
> optimal way in SPARK in a sustained manner.
>
> this is the classical problem in the world now where people are
> losing jobs because
> > the question we are trying to answer is "how to make a rocket science
> like SPARK work?"
> > rather than asking "how to run the same queries run at fraction of cost
> throughout the lifecycle of the data?"
>
> The only people who are gaining from the complications of running a simple
> query are those:
> > introducing those complications in infrastructure through completely and
> utterly unnecessary containerisations, and
> > the companies running out of SPARK
> everyone else is certainly not winning if the same solutions can be run at
> a fraction of the cost in a market.
>
>
>
> Regards,
> Gourav Sengupta
>
> On Wed, Dec 28, 2022 at 8:16 AM weicm@foxmail.com <we...@foxmail.com>
> wrote:
>
>> Thank you very much for your suggestion, but we are using a private
>> on-premises cluster in Spark on Hadoop mode, so I have to continue looking
>> for a solution.
>>
>> ------------------------------
>> weicm@foxmail.com
>>
>>
>> *From:* Gourav Sengupta <go...@gmail.com>
>> *Date:* 2022-12-28 15:06
>> *To:* weicm@foxmail.com
>> *CC:* user <us...@spark.apache.org>
>> *Subject:* Re: Spark thrift driver memory leak
>> Hi,
>>
>> have you tried redshift or snowflake?
>>
>> SPARK is too complicated and too much of rocket science to manage simple
>> operations.
>>
>> Also if you are in AWS try to use EMR based Presto or Trino, they can
>> aggregate massive data at 1000x the cost.
>>
>>
>> Regards,
>> Gourav Sengupta
>>
>> On Wed, Dec 28, 2022 at 4:10 AM weicm@foxmail.com <we...@foxmail.com>
>> wrote:
>>
>>> Spark Thrift Server memory overflow after running for a period of time, t
>>> hanks for any help!
>>>
>>>
>>> *ENV*
>>> Spark SQL (version 3.2.1)
>>> Driver: Hive JDBC (version 2.3.9)
>>> Hadoop 3.1.0.3.0.0.0-1634
>>>
>>> *Start Command*
>>> /usr/hdp/3.0.0.0-1634/spark-3.2.1-bin-hadoop3.2/sbin/start-thriftserver.sh
>>> --properties-file
>>> /usr/hdp/3.0.0.0-1634/spark-3.2.1-bin-hadoop3.2/conf/spark-thrift-sparkconf.conf
>>>
>>> *See attachment for configuration: spark-thrift-sparkconf.conf*
>>>
>>> weicm@foxmail.com
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>
>>

-- 
Bjørn Jørgensen
Vestre Aspehaug 4, 6010 Ålesund
Norge

+47 480 94 297

Re: Re: Spark thrift driver memory leak

Posted by Gourav Sengupta <go...@gmail.com>.

Hi,

Look at the number of variables that you have look at and tweak - this is
rocket science, I think that even rockets have less number of variables
available on their dashboard than it requires for queries to run in an
optimal way in SPARK in a sustained manner.

this is the classical problem in the world now where people are losing jobs
because
> the question we are trying to answer is "how to make a rocket science
like SPARK work?"
> rather than asking "how to run the same queries run at fraction of cost
throughout the lifecycle of the data?"

The only people who are gaining from the complications of running a simple
query are those:
> introducing those complications in infrastructure through completely and
utterly unnecessary containerisations, and
> the companies running out of SPARK
everyone else is certainly not winning if the same solutions can be run at
a fraction of the cost in a market.



Regards,
Gourav Sengupta

On Wed, Dec 28, 2022 at 8:16 AM weicm@foxmail.com <we...@foxmail.com> wrote:

> Thank you very much for your suggestion, but we are using a private
> on-premises cluster in Spark on Hadoop mode, so I have to continue looking
> for a solution.
>
> ------------------------------
> weicm@foxmail.com
>
>
> *From:* Gourav Sengupta <go...@gmail.com>
> *Date:* 2022-12-28 15:06
> *To:* weicm@foxmail.com
> *CC:* user <us...@spark.apache.org>
> *Subject:* Re: Spark thrift driver memory leak
> Hi,
>
> have you tried redshift or snowflake?
>
> SPARK is too complicated and too much of rocket science to manage simple
> operations.
>
> Also if you are in AWS try to use EMR based Presto or Trino, they can
> aggregate massive data at 1000x the cost.
>
>
> Regards,
> Gourav Sengupta
>
> On Wed, Dec 28, 2022 at 4:10 AM weicm@foxmail.com <we...@foxmail.com>
> wrote:
>
>> Spark Thrift Server memory overflow after running for a period of time, t
>> hanks for any help!
>>
>>
>> *ENV*
>> Spark SQL (version 3.2.1)
>> Driver: Hive JDBC (version 2.3.9)
>> Hadoop 3.1.0.3.0.0.0-1634
>>
>> *Start Command*
>> /usr/hdp/3.0.0.0-1634/spark-3.2.1-bin-hadoop3.2/sbin/start-thriftserver.sh
>> --properties-file
>> /usr/hdp/3.0.0.0-1634/spark-3.2.1-bin-hadoop3.2/conf/spark-thrift-sparkconf.conf
>>
>> *See attachment for configuration: spark-thrift-sparkconf.conf*
>>
>> weicm@foxmail.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>

Re: Re: Spark thrift driver memory leak

Posted by "weicm@foxmail.com" <we...@foxmail.com>.

Hi,

have you tried redshift or snowflake?

SPARK is too complicated and too much of rocket science to manage simple
operations.

Also if you are in AWS try to use EMR based Presto or Trino, they can
aggregate massive data at 1000x the cost.

Regards,
Gourav Sengupta

On Wed, Dec 28, 2022 at 4:10 AM weicm@foxmail.com <we...@foxmail.com> wrote:

> Spark Thrift Server memory overflow after running for a period of time, t
> hanks for any help!
>
>
> *ENV*
> Spark SQL (version 3.2.1)
> Driver: Hive JDBC (version 2.3.9)
> Hadoop 3.1.0.3.0.0.0-1634
>
> *Start Command*
> /usr/hdp/3.0.0.0-1634/spark-3.2.1-bin-hadoop3.2/sbin/start-thriftserver.sh
> --properties-file
> /usr/hdp/3.0.0.0-1634/spark-3.2.1-bin-hadoop3.2/conf/spark-thrift-sparkconf.conf
>
> *See attachment for configuration: spark-thrift-sparkconf.conf*
>
> weicm@foxmail.com
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>

Re: Spark thrift driver memory leak

Posted by Gourav Sengupta <go...@gmail.com>.

Hi Ayan,

while your suggestion is welcome, it does not look that you answered the
question from the user either :)

I am very much in peace but looks like yours is disrupted

There are several pertinent answers to the queries posted in any forum,
over here I am addressing the problem of selecting the right tool rather
than symptomatically addressing the temporary fix. Technically
tinkering isolated issues is one approach, and addressing the overall
design issues and architecture options is another approach. To say that one
is pertinent or not is just a matter of opinion.

Regards,
Gourav

On Wed, Dec 28, 2022 at 8:35 AM ayan guha <gu...@gmail.com> wrote:

> Hi Gaurav
>
> May I request to answer the question or hold your peace, please?
>
>  I have nothing but respect for all your opinions. However many of the
> cases users may not be right people to decide the high level design
> decision you are eluding to, and they are writing here to get some answers
> for their problems. At least thats the purpose of this email list.
>
> I understand you are trying to help but IMHO it is having opposite effect.
>
>
> On Wed, 28 Dec 2022 at 6:12 pm, Gourav Sengupta <go...@gmail.com>
> wrote:
>
>> Hi,
>>
>> have you tried redshift or snowflake?
>>
>> SPARK is too complicated and too much of rocket science to manage simple
>> operations.
>>
>> Also if you are in AWS try to use EMR based Presto or Trino, they can
>> aggregate massive data at 1000x the cost.
>>
>>
>> Regards,
>> Gourav Sengupta
>>
>> On Wed, Dec 28, 2022 at 4:10 AM weicm@foxmail.com <we...@foxmail.com>
>> wrote:
>>
>>> Spark Thrift Server memory overflow after running for a period of time, t
>>> hanks for any help!
>>>
>>>
>>> *ENV*
>>> Spark SQL (version 3.2.1)
>>> Driver: Hive JDBC (version 2.3.9)
>>> Hadoop 3.1.0.3.0.0.0-1634
>>>
>>> *Start Command*
>>> /usr/hdp/3.0.0.0-1634/spark-3.2.1-bin-hadoop3.2/sbin/start-thriftserver.sh
>>> --properties-file
>>> /usr/hdp/3.0.0.0-1634/spark-3.2.1-bin-hadoop3.2/conf/spark-thrift-sparkconf.conf
>>>
>>> *See attachment for configuration: spark-thrift-sparkconf.conf*
>>>
>>> weicm@foxmail.com
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>
>> --
> Best Regards,
> Ayan Guha
>

Re: Spark thrift driver memory leak

Posted by ayan guha <gu...@gmail.com>.

Hi Gaurav

May I request to answer the question or hold your peace, please?

 I have nothing but respect for all your opinions. However many of the
cases users may not be right people to decide the high level design
decision you are eluding to, and they are writing here to get some answers
for their problems. At least thats the purpose of this email list.

I understand you are trying to help but IMHO it is having opposite effect.

On Wed, 28 Dec 2022 at 6:12 pm, Gourav Sengupta <go...@gmail.com>
wrote:

> Hi,
>
> have you tried redshift or snowflake?
>
> SPARK is too complicated and too much of rocket science to manage simple
> operations.
>
> Also if you are in AWS try to use EMR based Presto or Trino, they can
> aggregate massive data at 1000x the cost.
>
>
> Regards,
> Gourav Sengupta
>
> On Wed, Dec 28, 2022 at 4:10 AM weicm@foxmail.com <we...@foxmail.com>
> wrote:
>
>> Spark Thrift Server memory overflow after running for a period of time, t
>> hanks for any help!
>>
>>
>> *ENV*
>> Spark SQL (version 3.2.1)
>> Driver: Hive JDBC (version 2.3.9)
>> Hadoop 3.1.0.3.0.0.0-1634
>>
>> *Start Command*
>> /usr/hdp/3.0.0.0-1634/spark-3.2.1-bin-hadoop3.2/sbin/start-thriftserver.sh
>> --properties-file
>> /usr/hdp/3.0.0.0-1634/spark-3.2.1-bin-hadoop3.2/conf/spark-thrift-sparkconf.conf
>>
>> *See attachment for configuration: spark-thrift-sparkconf.conf*
>>
>> weicm@foxmail.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
> --
Best Regards,
Ayan Guha

Re: Spark thrift driver memory leak

Posted by Gourav Sengupta <go...@gmail.com>.

Hi,

have you tried redshift or snowflake?

SPARK is too complicated and too much of rocket science to manage simple
operations.

Also if you are in AWS try to use EMR based Presto or Trino, they can
aggregate massive data at 1000x the cost.

Regards,
Gourav Sengupta

On Wed, Dec 28, 2022 at 4:10 AM weicm@foxmail.com <we...@foxmail.com> wrote:

> Spark Thrift Server memory overflow after running for a period of time, t
> hanks for any help!
>
>
> *ENV*
> Spark SQL (version 3.2.1)
> Driver: Hive JDBC (version 2.3.9)
> Hadoop 3.1.0.3.0.0.0-1634
>
> *Start Command*
> /usr/hdp/3.0.0.0-1634/spark-3.2.1-bin-hadoop3.2/sbin/start-thriftserver.sh
> --properties-file
> /usr/hdp/3.0.0.0-1634/spark-3.2.1-bin-hadoop3.2/conf/spark-thrift-sparkconf.conf
>
> *See attachment for configuration: spark-thrift-sparkconf.conf*
>
> weicm@foxmail.com
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>