You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by 尹绪森 <yi...@gmail.com> on 2014/02/02 05:27:23 UTC

Re: Spark app gets slower as it gets executed more times

Is your spark app an iterative one ? If so, your app is creating a big DAG
in every iteration. You should use checkpoint it periodically, say, 10
iterations one checkpoint.


2014-02-01 Aureliano Buendia <bu...@gmail.com>:

> Hi,
>
> I've noticed my spark app (on ec2) gets slower and slower as I repeatedly
> execute it.
>
> With a fresh ec2 cluster, it is snappy and takes about 15 mins to
> complete, after running the same app 4 times it gets slower and takes to 40
> mins and more.
>
> While the cluster gets slower, the monitoring metrics show less and less
> activities (almost no cpu, or io).
>
> When it gets slow, sometimes the number of running tasks (light blue in
> web ui progress bar) is zero, and only the number of completed tasks (dark
> blue) increments.
>
> Is this a known spark issue?
>
> Do I need to restart spark master and workers in between running apps?
>



-- 
Best Regards
-----------------------------------
Xusen Yin    尹绪森
Beijing Key Laboratory of Intelligent Telecommunications Software and
Multimedia
Beijing University of Posts & Telecommunications
Intel Labs China
Homepage: *http://yinxusen.github.io/ <http://yinxusen.github.io/>*

Re: Spark app gets slower as it gets executed more times

Posted by Aureliano Buendia <bu...@gmail.com>.
On Fri, Feb 7, 2014 at 7:48 AM, Aaron Davidson <il...@gmail.com> wrote:

> Sorry for delay, by long-running I just meant if you were running an
> iterative algorithm that was slowing down over time. We have observed this
> in the spark-perf benchmark; as file system state builds up, the job can
> slow down. Once the job finishes, however, it is cleaned up and should not
> affect subsequent jobs.
>
> I can think of three other possibilities for a slowdown: (1) unclean
> shutdown of Spark (i.e., kill -9), which doesn't allow us to clean up our
> data
>

By 'shutdown of Spark', do you mean shutting down the spark app, or the
spark cluster?

How is it possible to gracefully shut down a spark app?


> (2) buildup of logs in the work/ directory or files in the Spark tmp
> directory, and (3) bug in Spark (woo!).
>
>
> On Tue, Feb 4, 2014 at 5:58 AM, Aureliano Buendia <bu...@gmail.com>wrote:
>
>>
>>
>>
>> On Mon, Feb 3, 2014 at 12:26 AM, Aaron Davidson <il...@gmail.com>wrote:
>>
>>> Are you seeing any exceptions in between running apps? Does restarting
>>> the master/workers actually cause Spark to speed back up again? It's
>>> possible, for instance, that you run out of disk space, which should cause
>>> exceptions but not go away by restarting the master/workers.
>>>
>>
>> Not really, no exceptions and plenty of disk space left. At this point
>> I'm not certain that restarting spark master/workers definitely helps. The
>> only thing that does help is bringing up a fresh ec2 cluster, which is not
>> a solution. This could suggest that spark leaves some stuff and get build
>> up every time the app is executed.
>>
>>
>>>
>>> One thing to worry about is long-running jobs or shells.
>>>
>>
>> What do you mean by long running jobs?
>>
>>
>>> Currently, state buildup of a single job in Spark *is* a problem, as
>>> certain state such as shuffle files and RDD metadata is not cleaned up
>>> until the job (or shell) exits. We have hacky ways to reduce this, and are
>>> working on a long term solution. However, separate, consecutive jobs should
>>> be independent in terms of performance.
>>>
>>>
>>> On Sat, Feb 1, 2014 at 8:27 PM, 尹绪森 <yi...@gmail.com> wrote:
>>>
>>>> Is your spark app an iterative one ? If so, your app is creating a big
>>>> DAG in every iteration. You should use checkpoint it periodically, say, 10
>>>> iterations one checkpoint.
>>>>
>>>>
>>>> 2014-02-01 Aureliano Buendia <bu...@gmail.com>:
>>>>
>>>> Hi,
>>>>>
>>>>> I've noticed my spark app (on ec2) gets slower and slower as I
>>>>> repeatedly execute it.
>>>>>
>>>>> With a fresh ec2 cluster, it is snappy and takes about 15 mins to
>>>>> complete, after running the same app 4 times it gets slower and takes to 40
>>>>> mins and more.
>>>>>
>>>>> While the cluster gets slower, the monitoring metrics show less and
>>>>> less activities (almost no cpu, or io).
>>>>>
>>>>> When it gets slow, sometimes the number of running tasks (light blue
>>>>> in web ui progress bar) is zero, and only the number of completed tasks
>>>>> (dark blue) increments.
>>>>>
>>>>> Is this a known spark issue?
>>>>>
>>>>> Do I need to restart spark master and workers in between running apps?
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards
>>>> -----------------------------------
>>>> Xusen Yin    尹绪森
>>>> Beijing Key Laboratory of Intelligent Telecommunications Software and
>>>> Multimedia
>>>> Beijing University of Posts & Telecommunications
>>>> Intel Labs China
>>>> Homepage: *http://yinxusen.github.io/ <http://yinxusen.github.io/>*
>>>>
>>>
>>>
>>
>

Re: Spark app gets slower as it gets executed more times

Posted by Aaron Davidson <il...@gmail.com>.
Sorry for delay, by long-running I just meant if you were running an
iterative algorithm that was slowing down over time. We have observed this
in the spark-perf benchmark; as file system state builds up, the job can
slow down. Once the job finishes, however, it is cleaned up and should not
affect subsequent jobs.

I can think of three other possibilities for a slowdown: (1) unclean
shutdown of Spark (i.e., kill -9), which doesn't allow us to clean up our
data, (2) buildup of logs in the work/ directory or files in the Spark tmp
directory, and (3) bug in Spark (woo!).


On Tue, Feb 4, 2014 at 5:58 AM, Aureliano Buendia <bu...@gmail.com>wrote:

>
>
>
> On Mon, Feb 3, 2014 at 12:26 AM, Aaron Davidson <il...@gmail.com>wrote:
>
>> Are you seeing any exceptions in between running apps? Does restarting
>> the master/workers actually cause Spark to speed back up again? It's
>> possible, for instance, that you run out of disk space, which should cause
>> exceptions but not go away by restarting the master/workers.
>>
>
> Not really, no exceptions and plenty of disk space left. At this point I'm
> not certain that restarting spark master/workers definitely helps. The only
> thing that does help is bringing up a fresh ec2 cluster, which is not a
> solution. This could suggest that spark leaves some stuff and get build up
> every time the app is executed.
>
>
>>
>> One thing to worry about is long-running jobs or shells.
>>
>
> What do you mean by long running jobs?
>
>
>> Currently, state buildup of a single job in Spark *is* a problem, as
>> certain state such as shuffle files and RDD metadata is not cleaned up
>> until the job (or shell) exits. We have hacky ways to reduce this, and are
>> working on a long term solution. However, separate, consecutive jobs should
>> be independent in terms of performance.
>>
>>
>> On Sat, Feb 1, 2014 at 8:27 PM, 尹绪森 <yi...@gmail.com> wrote:
>>
>>> Is your spark app an iterative one ? If so, your app is creating a big
>>> DAG in every iteration. You should use checkpoint it periodically, say, 10
>>> iterations one checkpoint.
>>>
>>>
>>> 2014-02-01 Aureliano Buendia <bu...@gmail.com>:
>>>
>>> Hi,
>>>>
>>>> I've noticed my spark app (on ec2) gets slower and slower as I
>>>> repeatedly execute it.
>>>>
>>>> With a fresh ec2 cluster, it is snappy and takes about 15 mins to
>>>> complete, after running the same app 4 times it gets slower and takes to 40
>>>> mins and more.
>>>>
>>>> While the cluster gets slower, the monitoring metrics show less and
>>>> less activities (almost no cpu, or io).
>>>>
>>>> When it gets slow, sometimes the number of running tasks (light blue in
>>>> web ui progress bar) is zero, and only the number of completed tasks (dark
>>>> blue) increments.
>>>>
>>>> Is this a known spark issue?
>>>>
>>>> Do I need to restart spark master and workers in between running apps?
>>>>
>>>
>>>
>>>
>>> --
>>> Best Regards
>>> -----------------------------------
>>> Xusen Yin    尹绪森
>>> Beijing Key Laboratory of Intelligent Telecommunications Software and
>>> Multimedia
>>> Beijing University of Posts & Telecommunications
>>> Intel Labs China
>>> Homepage: *http://yinxusen.github.io/ <http://yinxusen.github.io/>*
>>>
>>
>>
>

Re: Spark app gets slower as it gets executed more times

Posted by Aureliano Buendia <bu...@gmail.com>.
On Mon, Feb 3, 2014 at 12:26 AM, Aaron Davidson <il...@gmail.com> wrote:

> Are you seeing any exceptions in between running apps? Does restarting the
> master/workers actually cause Spark to speed back up again? It's possible,
> for instance, that you run out of disk space, which should cause exceptions
> but not go away by restarting the master/workers.
>

Not really, no exceptions and plenty of disk space left. At this point I'm
not certain that restarting spark master/workers definitely helps. The only
thing that does help is bringing up a fresh ec2 cluster, which is not a
solution. This could suggest that spark leaves some stuff and get build up
every time the app is executed.


>
> One thing to worry about is long-running jobs or shells.
>

What do you mean by long running jobs?


> Currently, state buildup of a single job in Spark *is* a problem, as
> certain state such as shuffle files and RDD metadata is not cleaned up
> until the job (or shell) exits. We have hacky ways to reduce this, and are
> working on a long term solution. However, separate, consecutive jobs should
> be independent in terms of performance.
>
>
> On Sat, Feb 1, 2014 at 8:27 PM, 尹绪森 <yi...@gmail.com> wrote:
>
>> Is your spark app an iterative one ? If so, your app is creating a big
>> DAG in every iteration. You should use checkpoint it periodically, say, 10
>> iterations one checkpoint.
>>
>>
>> 2014-02-01 Aureliano Buendia <bu...@gmail.com>:
>>
>> Hi,
>>>
>>> I've noticed my spark app (on ec2) gets slower and slower as I
>>> repeatedly execute it.
>>>
>>> With a fresh ec2 cluster, it is snappy and takes about 15 mins to
>>> complete, after running the same app 4 times it gets slower and takes to 40
>>> mins and more.
>>>
>>> While the cluster gets slower, the monitoring metrics show less and less
>>> activities (almost no cpu, or io).
>>>
>>> When it gets slow, sometimes the number of running tasks (light blue in
>>> web ui progress bar) is zero, and only the number of completed tasks (dark
>>> blue) increments.
>>>
>>> Is this a known spark issue?
>>>
>>> Do I need to restart spark master and workers in between running apps?
>>>
>>
>>
>>
>> --
>> Best Regards
>> -----------------------------------
>> Xusen Yin    尹绪森
>> Beijing Key Laboratory of Intelligent Telecommunications Software and
>> Multimedia
>> Beijing University of Posts & Telecommunications
>> Intel Labs China
>> Homepage: *http://yinxusen.github.io/ <http://yinxusen.github.io/>*
>>
>
>

Re: Spark app gets slower as it gets executed more times

Posted by Aaron Davidson <il...@gmail.com>.
Are you seeing any exceptions in between running apps? Does restarting the
master/workers actually cause Spark to speed back up again? It's possible,
for instance, that you run out of disk space, which should cause exceptions
but not go away by restarting the master/workers.

One thing to worry about is long-running jobs or shells. Currently, state
buildup of a single job in Spark *is* a problem, as certain state such as
shuffle files and RDD metadata is not cleaned up until the job (or shell)
exits. We have hacky ways to reduce this, and are working on a long term
solution. However, separate, consecutive jobs should be independent in
terms of performance.


On Sat, Feb 1, 2014 at 8:27 PM, 尹绪森 <yi...@gmail.com> wrote:

> Is your spark app an iterative one ? If so, your app is creating a big DAG
> in every iteration. You should use checkpoint it periodically, say, 10
> iterations one checkpoint.
>
>
> 2014-02-01 Aureliano Buendia <bu...@gmail.com>:
>
> Hi,
>>
>> I've noticed my spark app (on ec2) gets slower and slower as I repeatedly
>> execute it.
>>
>> With a fresh ec2 cluster, it is snappy and takes about 15 mins to
>> complete, after running the same app 4 times it gets slower and takes to 40
>> mins and more.
>>
>> While the cluster gets slower, the monitoring metrics show less and less
>> activities (almost no cpu, or io).
>>
>> When it gets slow, sometimes the number of running tasks (light blue in
>> web ui progress bar) is zero, and only the number of completed tasks (dark
>> blue) increments.
>>
>> Is this a known spark issue?
>>
>> Do I need to restart spark master and workers in between running apps?
>>
>
>
>
> --
> Best Regards
> -----------------------------------
> Xusen Yin    尹绪森
> Beijing Key Laboratory of Intelligent Telecommunications Software and
> Multimedia
> Beijing University of Posts & Telecommunications
> Intel Labs China
> Homepage: *http://yinxusen.github.io/ <http://yinxusen.github.io/>*
>