You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Wenlei Xie <we...@gmail.com> on 2013/11/04 21:03:41 UTC

Performance drop / unstable in 0.8 release

Hi,

I have some iterative program written in Spark and have been tested under a
snapshot version of spark 0.8 before. After I ported it to the 0.8 release
version, I see  performance drops in large datasets. I am wondering if
there is any clue?

I monitored the number of partitions on each machine (by looking at
DAGScheduler.getCacheLocs). I observed that some machine may have 30
partitions in the previous iteration while only have < 10 partitions in the
next iterations. This is something I didn't observed in the older version.
Thus I am wondering if the release version would do task stealing
more aggressively (for a better dynamic load balance?)

Thank you!

Best Regards,
Wenlei

Re: Performance drop / unstable in 0.8 release

Posted by Wenlei Xie <we...@gmail.com>.
Hi,

I have all the code for the previous 0.8 version. But how I can find out
the SNAPSHOT version there? (in project/SparkBuild.scala it just
says version := "0.8.0-SNAPSHOT)

Best,
Wenlei


On Wed, Nov 6, 2013 at 12:09 AM, Reynold Xin <rx...@apache.org> wrote:

> I don't even think task stealing / speculative execution is turned on by
> default. Do you know what snapshot version you used for 0.8 previously?
>
>
> On Mon, Nov 4, 2013 at 12:03 PM, Wenlei Xie <we...@gmail.com> wrote:
>
>> Hi,
>>
>> I have some iterative program written in Spark and have been tested under
>> a snapshot version of spark 0.8 before. After I ported it to the 0.8
>> release version, I see  performance drops in large datasets. I am wondering
>> if there is any clue?
>>
>> I monitored the number of partitions on each machine (by looking at
>> DAGScheduler.getCacheLocs). I observed that some machine may have 30
>> partitions in the previous iteration while only have < 10 partitions in the
>> next iterations. This is something I didn't observed in the older version.
>> Thus I am wondering if the release version would do task stealing
>> more aggressively (for a better dynamic load balance?)
>>
>> Thank you!
>>
>> Best Regards,
>> Wenlei
>>
>>
>


-- 
Wenlei Xie (谢文磊)

Department of Computer Science
5132 Upson Hall, Cornell University
Ithaca, NY 14853, USA
Phone: (607) 255-5577
Email: wenlei.xie@gmail.com

Re: Performance drop / unstable in 0.8 release

Posted by Reynold Xin <rx...@apache.org>.
I don't even think task stealing / speculative execution is turned on by
default. Do you know what snapshot version you used for 0.8 previously?


On Mon, Nov 4, 2013 at 12:03 PM, Wenlei Xie <we...@gmail.com> wrote:

> Hi,
>
> I have some iterative program written in Spark and have been tested under
> a snapshot version of spark 0.8 before. After I ported it to the 0.8
> release version, I see  performance drops in large datasets. I am wondering
> if there is any clue?
>
> I monitored the number of partitions on each machine (by looking at
> DAGScheduler.getCacheLocs). I observed that some machine may have 30
> partitions in the previous iteration while only have < 10 partitions in the
> next iterations. This is something I didn't observed in the older version.
> Thus I am wondering if the release version would do task stealing
> more aggressively (for a better dynamic load balance?)
>
> Thank you!
>
> Best Regards,
> Wenlei
>
>