You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by SK <sk...@gmail.com> on 2014/09/19 21:17:53 UTC

mllib performance on mesos cluster

Hi,

I have a program similar to the BinaryClassifier example that I am running
using my data (which is fairly small). I run this for 100 iterations. I
observed the following performance:

Standalone mode cluster with 10 nodes (with Spark 1.0.2):  5 minutes
Standalone mode cluster with 10 nodes (with Spark 1.1.0):  8.9 minutes
Mesos cluster with 10 nodes (with Spark 1.1.0): 17 minutes

1) According to the documentation, Spark 1.1.0 has better performance. So I
would like to understand why the runtime is longer on Spark 1.1.0. 

2) Why is the performance on mesos significantly higher than in standalone
mode?  I just wanted to find out if any one else has observed poor
performance for Mllib based programs on mesos cluster. I looked through the
application detail logs and found that some of the scheduler delay values
were higher on mesos compared to standalone mode (40 ms vs. 10 ms).  Is the
mesos scheduler slower?

thanks



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/mllib-performance-on-mesos-cluster-tp14692.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: mllib performance on mesos cluster

Posted by Sudha Krishna <sk...@gmail.com>.

Setting spark.mesos.coarse=true helped reduce the time on the mesos cluster
from 17 min to around 6 min. The scheduler delay per task reduced from 40
ms to around 10 ms.

thanks


On Mon, Sep 22, 2014 at 12:36 PM, Xiangrui Meng <me...@gmail.com> wrote:

> 1) MLlib 1.1 should be faster than 1.0 in general. What's the size of
> your dataset? Is the RDD evenly distributed across nodes? You can
> check the storage tab of the Spark WebUI.
>
> 2) I don't have much experience with mesos deployment. Someone else
> may be able to answer your question.
>
> -Xiangrui
>
> On Fri, Sep 19, 2014 at 12:17 PM, SK <sk...@gmail.com> wrote:
> > Hi,
> >
> > I have a program similar to the BinaryClassifier example that I am
> running
> > using my data (which is fairly small). I run this for 100 iterations. I
> > observed the following performance:
> >
> > Standalone mode cluster with 10 nodes (with Spark 1.0.2):  5 minutes
> > Standalone mode cluster with 10 nodes (with Spark 1.1.0):  8.9 minutes
> > Mesos cluster with 10 nodes (with Spark 1.1.0): 17 minutes
> >
> > 1) According to the documentation, Spark 1.1.0 has better performance.
> So I
> > would like to understand why the runtime is longer on Spark 1.1.0.
> >
> > 2) Why is the performance on mesos significantly higher than in
> standalone
> > mode?  I just wanted to find out if any one else has observed poor
> > performance for Mllib based programs on mesos cluster. I looked through
> the
> > application detail logs and found that some of the scheduler delay values
> > were higher on mesos compared to standalone mode (40 ms vs. 10 ms).  Is
> the
> > mesos scheduler slower?
> >
> > thanks
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/mllib-performance-on-mesos-cluster-tp14692.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> > For additional commands, e-mail: user-help@spark.apache.org
> >
>

Re: mllib performance on mesos cluster

Posted by Xiangrui Meng <me...@gmail.com>.

1) MLlib 1.1 should be faster than 1.0 in general. What's the size of
your dataset? Is the RDD evenly distributed across nodes? You can
check the storage tab of the Spark WebUI.

2) I don't have much experience with mesos deployment. Someone else
may be able to answer your question.

-Xiangrui

On Fri, Sep 19, 2014 at 12:17 PM, SK <sk...@gmail.com> wrote:
> Hi,
>
> I have a program similar to the BinaryClassifier example that I am running
> using my data (which is fairly small). I run this for 100 iterations. I
> observed the following performance:
>
> Standalone mode cluster with 10 nodes (with Spark 1.0.2):  5 minutes
> Standalone mode cluster with 10 nodes (with Spark 1.1.0):  8.9 minutes
> Mesos cluster with 10 nodes (with Spark 1.1.0): 17 minutes
>
> 1) According to the documentation, Spark 1.1.0 has better performance. So I
> would like to understand why the runtime is longer on Spark 1.1.0.
>
> 2) Why is the performance on mesos significantly higher than in standalone
> mode?  I just wanted to find out if any one else has observed poor
> performance for Mllib based programs on mesos cluster. I looked through the
> application detail logs and found that some of the scheduler delay values
> were higher on mesos compared to standalone mode (40 ms vs. 10 ms).  Is the
> mesos scheduler slower?
>
> thanks
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/mllib-performance-on-mesos-cluster-tp14692.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org