You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Cooper <ah...@gmail.com> on 2016/09/27 02:05:42 UTC

Large-scale matrix inverse in Spark

How is the problem of large-scale matrix inversion approached in Apache Spark
?

This linear algebra operation is obviously the very base of a lot of other
algorithms (regression, classification, etc). However, I have not been able
to find a Spark API on parallel implementation of matrix inversion. Can you
please clarify approaching this operation on the Spark internals ?

Here <http://ieeexplore.ieee.org/abstract/document/7562171/>   is a paper on
the parallelized matrix inversion in Spark, however I am trying to use an
existing code instead of implementing one from scratch, if available.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Large-scale-matrix-inverse-in-Spark-tp27796.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Large-scale matrix inverse in Spark

Posted by Robineast <Ro...@xense.co.uk>.
The paper you mention references a Spark-based LU decomposition approach. AFAIK there is no current implementation in Spark but there is a JIRA open (https://issues.apache.org/jira/browse/SPARK-8514 <https://issues.apache.org/jira/browse/SPARK-8514>) that covers this - seems to have gone quiet though.
-------------------------------------------------------------------------------
Robin East
Spark GraphX in Action Michael Malak and Robin East
Manning Publications Co.
http://www.manning.com/books/spark-graphx-in-action <http://www.manning.com/books/spark-graphx-in-action>





> On 27 Sep 2016, at 03:05, Cooper [via Apache Spark User List] <ml...@n3.nabble.com> wrote:
> 
> How is the problem of large-scale matrix inversion approached in Apache Spark ? 
> 
> This linear algebra operation is obviously the very base of a lot of other algorithms (regression, classification, etc). However, I have not been able to find a Spark API on parallel implementation of matrix inversion. Can you please clarify approaching this operation on the Spark internals ? 
> 
> Here <http://ieeexplore.ieee.org/abstract/document/7562171/> is a paper on the parallelized matrix inversion in Spark, however I am trying to use an existing code instead of implementing one from scratch, if available. 
> 
> If you reply to this email, your message will be added to the discussion below:
> http://apache-spark-user-list.1001560.n3.nabble.com/Large-scale-matrix-inverse-in-Spark-tp27796.html <http://apache-spark-user-list.1001560.n3.nabble.com/Large-scale-matrix-inverse-in-Spark-tp27796.html>
> To start a new topic under Apache Spark User List, email ml-node+s1001560n1h36@n3.nabble.com 
> To unsubscribe from Apache Spark User List, click here <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=Um9iaW4uZWFzdEB4ZW5zZS5jby51a3wxfDIzMzQzMDUyNg==>.
> NAML <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>




-----
Robin East 
Spark GraphX in Action Michael Malak and Robin East 
Manning Publications Co. 
http://www.manning.com/books/spark-graphx-in-action

--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Large-scale-matrix-inverse-in-Spark-tp27796p27809.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Large-scale matrix inverse in Spark

Posted by Anastasios Zouzias <zo...@gmail.com>.
Hi there,

As Edward noted, if you ask a numerical analyst about matrix inversion,
they will respond "you never invert a matrix, but you solve the linear
system associated with the matrix". Linear system solving is usually done
with iterative methods or matrix decompositions (as noted above). The
reason why people avoid matrix inversion is because of its inherited poor
numerical stability.

Best,
Anastasios

On Tue, Sep 27, 2016 at 8:42 AM, Edward Fine <ed...@gmail.com> wrote:

> I have not found matrix inversion algorithms in Spark and I would be
> surprised to see them.  Except for matrices with very special structure
> (like those nearly the identity), inverting and n*n matrix is slower than
> O(n^2), which does not scale.  Whenever a matrix is inverted, usually a
> decomposition or a low rank approximation is used, just as Sean pointed
> out.  See further https://en.wikipedia.org/wiki/Computational_
> complexity_of_mathematical_operations#Matrix_algebra
> or if you really want to dig into it
> Stoer and Bulirsch http://www.springer.com/us/book/9780387954523
>
> On Mon, Sep 26, 2016 at 11:00 PM Sean Owen <so...@cloudera.com> wrote:
>
>> I don't recall any code in Spark that computes a matrix inverse. There is
>> code that solves linear systems Ax = b with a decomposition. For example
>> from looking at the code recently, I think the regression implementation
>> actually solves AtAx = Atb using a Cholesky decomposition. But, A = n x k,
>> where n is large but k is smallish (number of features), so AtA is k x k
>> and can be solved in-memory with a library.
>>
>> On Tue, Sep 27, 2016 at 3:05 AM, Cooper <ah...@gmail.com> wrote:
>> > How is the problem of large-scale matrix inversion approached in Apache
>> Spark
>> > ?
>> >
>> > This linear algebra operation is obviously the very base of a lot of
>> other
>> > algorithms (regression, classification, etc). However, I have not been
>> able
>> > to find a Spark API on parallel implementation of matrix inversion. Can
>> you
>> > please clarify approaching this operation on the Spark internals ?
>> >
>> > Here <http://ieeexplore.ieee.org/abstract/document/7562171/>   is a
>> paper on
>> > the parallelized matrix inversion in Spark, however I am trying to use
>> an
>> > existing code instead of implementing one from scratch, if available.
>> >
>> >
>> >
>> > --
>> > View this message in context: http://apache-spark-user-list.
>> 1001560.n3.nabble.com/Large-scale-matrix-inverse-in-Spark-tp27796.html
>> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>> >
>>
>>


-- 
-- Anastasios Zouzias
<az...@zurich.ibm.com>

Re: Large-scale matrix inverse in Spark

Posted by Edward Fine <ed...@gmail.com>.
I have not found matrix inversion algorithms in Spark and I would be
surprised to see them.  Except for matrices with very special structure
(like those nearly the identity), inverting and n*n matrix is slower than
O(n^2), which does not scale.  Whenever a matrix is inverted, usually a
decomposition or a low rank approximation is used, just as Sean pointed
out.  See further
https://en.wikipedia.org/wiki/Computational_complexity_of_mathematical_operations#Matrix_algebra

or if you really want to dig into it
Stoer and Bulirsch http://www.springer.com/us/book/9780387954523

On Mon, Sep 26, 2016 at 11:00 PM Sean Owen <so...@cloudera.com> wrote:

> I don't recall any code in Spark that computes a matrix inverse. There is
> code that solves linear systems Ax = b with a decomposition. For example
> from looking at the code recently, I think the regression implementation
> actually solves AtAx = Atb using a Cholesky decomposition. But, A = n x k,
> where n is large but k is smallish (number of features), so AtA is k x k
> and can be solved in-memory with a library.
>
> On Tue, Sep 27, 2016 at 3:05 AM, Cooper <ah...@gmail.com> wrote:
> > How is the problem of large-scale matrix inversion approached in Apache
> Spark
> > ?
> >
> > This linear algebra operation is obviously the very base of a lot of
> other
> > algorithms (regression, classification, etc). However, I have not been
> able
> > to find a Spark API on parallel implementation of matrix inversion. Can
> you
> > please clarify approaching this operation on the Spark internals ?
> >
> > Here <http://ieeexplore.ieee.org/abstract/document/7562171/>   is a
> paper on
> > the parallelized matrix inversion in Spark, however I am trying to use an
> > existing code instead of implementing one from scratch, if available.
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Large-scale-matrix-inverse-in-Spark-tp27796.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> >
>
>

Re: Large-scale matrix inverse in Spark

Posted by Sean Owen <so...@cloudera.com>.
I don't recall any code in Spark that computes a matrix inverse. There is
code that solves linear systems Ax = b with a decomposition. For example
from looking at the code recently, I think the regression implementation
actually solves AtAx = Atb using a Cholesky decomposition. But, A = n x k,
where n is large but k is smallish (number of features), so AtA is k x k
and can be solved in-memory with a library.

On Tue, Sep 27, 2016 at 3:05 AM, Cooper <ah...@gmail.com> wrote:
> How is the problem of large-scale matrix inversion approached in Apache
Spark
> ?
>
> This linear algebra operation is obviously the very base of a lot of other
> algorithms (regression, classification, etc). However, I have not been
able
> to find a Spark API on parallel implementation of matrix inversion. Can
you
> please clarify approaching this operation on the Spark internals ?
>
> Here <http://ieeexplore.ieee.org/abstract/document/7562171/>   is a paper
on
> the parallelized matrix inversion in Spark, however I am trying to use an
> existing code instead of implementing one from scratch, if available.
>
>
>
> --
> View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Large-scale-matrix-inverse-in-Spark-tp27796.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>