You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by DB Tsai <db...@dbtsai.com> on 2014/07/27 05:30:45 UTC

Spark MLlib vs BIDMach Benchmark

BIDMach is CPU and GPU-accelerated Machine Learning Library also from Berkeley.

https://github.com/BIDData/BIDMach/wiki/Benchmarks

They did benchmark against Spark 0.9, and they claimed that it's
significantly faster than Spark MLlib. In Spark 1.0, lot of
performance optimization had been done, and sparse data is supported.
It will be interesting to see new benchmark result.

Anyone familiar with BIDMach? Are they as fast as they claim?

Sincerely,

DB Tsai
-------------------------------------------------------
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai

Re: Spark MLlib vs BIDMach Benchmark

Posted by Ameet Talwalkar <at...@gmail.com>.

To add to the last point, multimodel training is something we've explored
as part of the MLbase Optimizer, and we've seen some nice speedups.  This
feature will be added to MLlib soon (not sure if it'll make it into the 1.1
release though).


On Sat, Jul 26, 2014 at 11:27 PM, Matei Zaharia <ma...@gmail.com>
wrote:

> BTW I should add that one other thing that would help MLlib locally would
> be doing model updates in batches. That is, instead of operating on one
> point at a time, group together a bunch of them and apply a matrix
> operation, which will allow more efficient use of BLAS or other linear
> algebra primitives. We don't do a lot of this yet, but there was a project
> in the AMPLab to do more of it. Multiple models can also be trained
> simultaneously with this approach.
>
> On July 26, 2014 at 11:21:17 PM, Matei Zaharia (matei.zaharia@gmail.com)
> wrote:
>
>  These numbers are from GPUs and Intel MKL (a closed-source math library
> for Intel processors), where for CPU-bound algorithms you are going to get
> faster speeds than MLlib's JBLAS. However, there's in theory nothing
> preventing the use of these in MLlib (e.g. if you have a faster BLAS
> locally; adding a GPU-based one would probably require bigger code changes).
>
>  Some of the numbers there are also from more naive implementations of
> K-means and logistic regression in the Spark research paper, which include
> the fairly expensive cost of reading the data out of HDFS.
>
> On July 26, 2014 at 8:31:11 PM, DB Tsai (dbtsai@dbtsai.com) wrote:
>
>  BIDMach is CPU and GPU-accelerated Machine Learning Library also from
> Berkeley.
>
> https://github.com/BIDData/BIDMach/wiki/Benchmarks
>
> They did benchmark against Spark 0.9, and they claimed that it's
> significantly faster than Spark MLlib. In Spark 1.0, lot of
> performance optimization had been done, and sparse data is supported.
> It will be interesting to see new benchmark result.
>
> Anyone familiar with BIDMach? Are they as fast as they claim?
>
> Sincerely,
>
> DB Tsai
> -------------------------------------------------------
> My Blog: https://www.dbtsai.com
> LinkedIn: https://www.linkedin.com/in/dbtsai
>
>

Re: Spark MLlib vs BIDMach Benchmark

Posted by Matei Zaharia <ma...@gmail.com>.

BTW I should add that one other thing that would help MLlib locally would be doing model updates in batches. That is, instead of operating on one point at a time, group together a bunch of them and apply a matrix operation, which will allow more efficient use of BLAS or other linear algebra primitives. We don't do a lot of this yet, but there was a project in the AMPLab to do more of it. Multiple models can also be trained simultaneously with this approach.

On July 26, 2014 at 11:21:17 PM, Matei Zaharia (matei.zaharia@gmail.com) wrote:

These numbers are from GPUs and Intel MKL (a closed-source math library for Intel processors), where for CPU-bound algorithms you are going to get faster speeds than MLlib's JBLAS. However, there's in theory nothing preventing the use of these in MLlib (e.g. if you have a faster BLAS locally; adding a GPU-based one would probably require bigger code changes).

Some of the numbers there are also from more naive implementations of K-means and logistic regression in the Spark research paper, which include the fairly expensive cost of reading the data out of HDFS.

On July 26, 2014 at 8:31:11 PM, DB Tsai (dbtsai@dbtsai.com) wrote:

BIDMach is CPU and GPU-accelerated Machine Learning Library also from Berkeley.

https://github.com/BIDData/BIDMach/wiki/Benchmarks

They did benchmark against Spark 0.9, and they claimed that it's
significantly faster than Spark MLlib. In Spark 1.0, lot of
performance optimization had been done, and sparse data is supported.
It will be interesting to see new benchmark result.

Anyone familiar with BIDMach? Are they as fast as they claim?

Sincerely,

DB Tsai
-------------------------------------------------------
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai

Re: Spark MLlib vs BIDMach Benchmark

Posted by Matei Zaharia <ma...@gmail.com>.

These numbers are from GPUs and Intel MKL (a closed-source math library for Intel processors), where for CPU-bound algorithms you are going to get faster speeds than MLlib's JBLAS. However, there's in theory nothing preventing the use of these in MLlib (e.g. if you have a faster BLAS locally; adding a GPU-based one would probably require bigger code changes).

Some of the numbers there are also from more naive implementations of K-means and logistic regression in the Spark research paper, which include the fairly expensive cost of reading the data out of HDFS.

On July 26, 2014 at 8:31:11 PM, DB Tsai (dbtsai@dbtsai.com) wrote:

BIDMach is CPU and GPU-accelerated Machine Learning Library also from Berkeley. 

https://github.com/BIDData/BIDMach/wiki/Benchmarks 

They did benchmark against Spark 0.9, and they claimed that it's 
significantly faster than Spark MLlib. In Spark 1.0, lot of 
performance optimization had been done, and sparse data is supported. 
It will be interesting to see new benchmark result. 

Anyone familiar with BIDMach? Are they as fast as they claim? 

Sincerely, 

DB Tsai 
------------------------------------------------------- 
My Blog: https://www.dbtsai.com 
LinkedIn: https://www.linkedin.com/in/dbtsai