You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by thibaut <th...@gmail.com> on 2016/05/05 19:45:56 UTC

Matrix inversion

Hi there,

I am working in a group of research of the Michigan University (Mathematics), and we are thinking to increase the speed of some algorithms we are using and developed here, by using distributed systems.

We were thinking about using Spark, but I found recently Mahout and I read about it. We are using a lot KNN and Minimal Spanning Tree here, and our main concern is about dealing with the inversion of Matrix (really really big matrix)

I found this paper : https://web.njit.edu/~ansari/papers/16IEEEAccess.pdf <https://web.njit.edu/~ansari/papers/16IEEEAccess.pdf> , Spark-based Large-scale Matrix Inversion for Big Data Processing, which provides a really good method for dealing with the inversion issue.

My askings are: 
- Is it better for what we want to do to use Mahout, or Spark ? 
- I saw that you already have a distributed PCA. Do you have a really efficient matrix inversion algorithm in Mahout ? 
- How good is the linear algebra library in compare to Matlab for example ?

Finally, our main concern for using Spark is about the linear algebra library that is used with Spark. And we were wondering how good is the Mahout one ?

Thanking you in advance,

Best regards.
Thibaut

Re: Matrix inversion

Posted by Ted Dunning <te...@gmail.com>.

Mahout is considerably better at sparse operations and optimizations than
dense ones.

Beyond that, I would expect that you would do better with traditional math
libraries.

And, are you really trying to invert a matrix? The common maxim is that
this implies an error in your method because inversion is O(n^3) and often
ill-conditioned to boot.  Usually, an implicit form of inversion via a
decompositional representation is far better than a true inversion. For
large systems the situation is even more stark, numerical accuracy
limitations and noise in the original data make it impossible to do better
than an approximate AND implicit inverse such as a limited rank SVD.




On Thu, May 5, 2016 at 1:56 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> BTW, Thibaut, in the paper you mention, MPI based implementation beats
> Spark at least 2 times on performance of the inversion. Kinda what i was
> saying -- and in this case it doesn't seem that algorithm is as highly
> interconnected as, e.g., naive blockwise multiplication.
>
> On Thu, May 5, 2016 at 1:50 PM, Dmitriy Lyubimov <dl...@gmail.com>
> wrote:
>
> > The mantra i keep hearing is that if someone needs matrix inversion then
> > he/she must be doing something wrong. Not sure how true that is, but in
> all
> > cases i have encountered, people try to avoid matrix inversion one way or
> > another.
> >
> > Re: libraries: Mahout is more about apis now than any particular in-core
> > library. Unfortunately, mahout's in-memory operations are rooted in
> > single-threaded colt and are pretty slow at the moment. We are looking
> for
> > ways of doing in-memory operations faster and integrating something
> better
> > and native.
> >
> > However, the really limiting factor seems to be Spark programming model
> > and the effects it brings to interconnected I/O problems with high degree
> > of scattering. Cf. , for example, to performances you can get with MKL
> MPI
> > wrapper. If you are looking for performance of distributed algebra on
> CPUs,
> > there's very few things that can compete with MKL MPI wrapper.
> >
> > My personal opinion is that for as long as the problem fits in memory
> (and
> > most of them do nowadays), no algorithm on spark is going to beat Matlab
> in
> > matrix multiplication and such, all things being equal, no matter how
> many
> > cores spark cluster gets, on 1gbit networks. The same seems to be 10-fold
> > true when comparing to GPU based algorithms (case in point: BidMach).
> >
> > On Thu, May 5, 2016 at 12:45 PM, thibaut <th...@gmail.com>
> > wrote:
> >
> >>
> >> My askings are:
> >> - Is it better for what we want to do to use Mahout, or Spark ?
> >>
> >
> > Mahout at this point is better for declarative prototyping as it contains
> > distributed optimizer and compact expression dsl.
> >
> > - I saw that you already have a distributed PCA. Do you have a really
> >> efficient matrix inversion algorithm in Mahout ?
> >>
> > PCA underpinnings are described in detail in the "AM:Beyond MapReduce"
> > book.
> >
> >> - How good is the linear algebra library in compare to Matlab for
> example
> >> ?
> >>
> > See my opinion above about algorithms on spark. Yes, i did some
> > benchmarking and digging around. Some things could be on-par, but
> > interconnected things are decidedly worse than single node Matlab (in
> terms
> > of speed).
> >
> >>
> >> Finally, our main concern for using Spark is about the linear algebra
> >> library that is used with Spark. And we were wondering how good is the
> >> Mahout one ?
> >
> > What do you mean specifically? Speed? As i said, the in-core speed is
> what
> > one can expect from java based implementation, but in-core speed factor
> > seems to be far overshadowed by I/O programming model issues in highly
> > interconnected problems once certain size of the problem is reached.
> >
> >>
> >>
> > Thanking you in advance,
> >>
> >> Best regards.
> >> Thibaut
> >
> >
> >
>

Re: Matrix inversion

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

BTW, Thibaut, in the paper you mention, MPI based implementation beats
Spark at least 2 times on performance of the inversion. Kinda what i was
saying -- and in this case it doesn't seem that algorithm is as highly
interconnected as, e.g., naive blockwise multiplication.

On Thu, May 5, 2016 at 1:50 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> The mantra i keep hearing is that if someone needs matrix inversion then
> he/she must be doing something wrong. Not sure how true that is, but in all
> cases i have encountered, people try to avoid matrix inversion one way or
> another.
>
> Re: libraries: Mahout is more about apis now than any particular in-core
> library. Unfortunately, mahout's in-memory operations are rooted in
> single-threaded colt and are pretty slow at the moment. We are looking for
> ways of doing in-memory operations faster and integrating something better
> and native.
>
> However, the really limiting factor seems to be Spark programming model
> and the effects it brings to interconnected I/O problems with high degree
> of scattering. Cf. , for example, to performances you can get with MKL MPI
> wrapper. If you are looking for performance of distributed algebra on CPUs,
> there's very few things that can compete with MKL MPI wrapper.
>
> My personal opinion is that for as long as the problem fits in memory (and
> most of them do nowadays), no algorithm on spark is going to beat Matlab in
> matrix multiplication and such, all things being equal, no matter how many
> cores spark cluster gets, on 1gbit networks. The same seems to be 10-fold
> true when comparing to GPU based algorithms (case in point: BidMach).
>
> On Thu, May 5, 2016 at 12:45 PM, thibaut <th...@gmail.com>
> wrote:
>
>>
>> My askings are:
>> - Is it better for what we want to do to use Mahout, or Spark ?
>>
>
> Mahout at this point is better for declarative prototyping as it contains
> distributed optimizer and compact expression dsl.
>
> - I saw that you already have a distributed PCA. Do you have a really
>> efficient matrix inversion algorithm in Mahout ?
>>
> PCA underpinnings are described in detail in the "AM:Beyond MapReduce"
> book.
>
>> - How good is the linear algebra library in compare to Matlab for example
>> ?
>>
> See my opinion above about algorithms on spark. Yes, i did some
> benchmarking and digging around. Some things could be on-par, but
> interconnected things are decidedly worse than single node Matlab (in terms
> of speed).
>
>>
>> Finally, our main concern for using Spark is about the linear algebra
>> library that is used with Spark. And we were wondering how good is the
>> Mahout one ?
>
> What do you mean specifically? Speed? As i said, the in-core speed is what
> one can expect from java based implementation, but in-core speed factor
> seems to be far overshadowed by I/O programming model issues in highly
> interconnected problems once certain size of the problem is reached.
>
>>
>>
> Thanking you in advance,
>>
>> Best regards.
>> Thibaut
>
>
>

Re: Matrix inversion

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

The mantra i keep hearing is that if someone needs matrix inversion then
he/she must be doing something wrong. Not sure how true that is, but in all
cases i have encountered, people try to avoid matrix inversion one way or
another.

Re: libraries: Mahout is more about apis now than any particular in-core
library. Unfortunately, mahout's in-memory operations are rooted in
single-threaded colt and are pretty slow at the moment. We are looking for
ways of doing in-memory operations faster and integrating something better
and native.

However, the really limiting factor seems to be Spark programming model and
the effects it brings to interconnected I/O problems with high degree of
scattering. Cf. , for example, to performances you can get with MKL MPI
wrapper. If you are looking for performance of distributed algebra on CPUs,
there's very few things that can compete with MKL MPI wrapper.

My personal opinion is that for as long as the problem fits in memory (and
most of them do nowadays), no algorithm on spark is going to beat Matlab in
matrix multiplication and such, all things being equal, no matter how many
cores spark cluster gets, on 1gbit networks. The same seems to be 10-fold
true when comparing to GPU based algorithms (case in point: BidMach).

On Thu, May 5, 2016 at 12:45 PM, thibaut <th...@gmail.com>
wrote:

>
> My askings are:
> - Is it better for what we want to do to use Mahout, or Spark ?
>

Mahout at this point is better for declarative prototyping as it contains
distributed optimizer and compact expression dsl.

- I saw that you already have a distributed PCA. Do you have a really
> efficient matrix inversion algorithm in Mahout ?
>
PCA underpinnings are described in detail in the "AM:Beyond MapReduce"
book.

> - How good is the linear algebra library in compare to Matlab for example ?
>
See my opinion above about algorithms on spark. Yes, i did some
benchmarking and digging around. Some things could be on-par, but
interconnected things are decidedly worse than single node Matlab (in terms
of speed).

>
> Finally, our main concern for using Spark is about the linear algebra
> library that is used with Spark. And we were wondering how good is the
> Mahout one ?

What do you mean specifically? Speed? As i said, the in-core speed is what
one can expect from java based implementation, but in-core speed factor
seems to be far overshadowed by I/O programming model issues in highly
interconnected problems once certain size of the problem is reached.

>
>
Thanking you in advance,
>
> Best regards.
> Thibaut