You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Raghavendra Pandey <ra...@gmail.com> on 2014/12/31 04:15:25 UTC

Spark app performance

I have a spark app that involves series of mapPartition operations and then
a keyBy operation. I have measured the time inside mapPartition function
block. These blocks take trivial time. Still the application takes way too
much time and even sparkUI shows that much time.
So i was wondering where does it take time and how can I reduce this.

Thanks
Raghavendra

Re: Spark app performance

Posted by Raghavendra Pandey <ra...@gmail.com>.
I have seen that link. I am using RDD of Byte Array n Kryo serialization.
Inside mapPartition when I measure time it is never more than 1 ms whereas
total time took by application is like 30 min. Codebase has lot of
dependencies. I m trying to come up with a simple version where I can
reproduce this problem.
Also GC timings reported by spark ui is always in the range of 3~4%of total
time.

On Thu, Jan 1, 2015, 14:05 Akhil Das <ak...@sigmoidanalytics.com> wrote:

> Would be great if you can share the piece of code happening inside your
> mapPartition, I'm assuming you are creating/handling a lot of Complex
> objects and hence it slows down the performance. Here's a link
> <http://spark.apache.org/docs/latest/tuning.html> to performance tuning
> if you haven't seen it already.
>
> Thanks
> Best Regards
>
> On Wed, Dec 31, 2014 at 8:45 AM, Raghavendra Pandey <
> raghavendra.pandey@gmail.com> wrote:
>
>> I have a spark app that involves series of mapPartition operations and
>> then a keyBy operation. I have measured the time inside mapPartition
>> function block. These blocks take trivial time. Still the application takes
>> way too much time and even sparkUI shows that much time.
>> So i was wondering where does it take time and how can I reduce this.
>>
>> Thanks
>> Raghavendra
>>
>
>

Re: Spark app performance

Posted by Akhil Das <ak...@sigmoidanalytics.com>.
Would be great if you can share the piece of code happening inside your
mapPartition, I'm assuming you are creating/handling a lot of Complex
objects and hence it slows down the performance. Here's a link
<http://spark.apache.org/docs/latest/tuning.html> to performance tuning if
you haven't seen it already.

Thanks
Best Regards

On Wed, Dec 31, 2014 at 8:45 AM, Raghavendra Pandey <
raghavendra.pandey@gmail.com> wrote:

> I have a spark app that involves series of mapPartition operations and
> then a keyBy operation. I have measured the time inside mapPartition
> function block. These blocks take trivial time. Still the application takes
> way too much time and even sparkUI shows that much time.
> So i was wondering where does it take time and how can I reduce this.
>
> Thanks
> Raghavendra
>