You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Chandan Misra (JIRA)" <ji...@apache.org> on 2018/02/01 09:52:00 UTC

[jira] [Commented] (SPARK-23266) Matrix Inversion on BlockMatrix

    [ https://issues.apache.org/jira/browse/SPARK-23266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16348310#comment-16348310 ] 

Chandan Misra commented on SPARK-23266:
---------------------------------------

*How big is n typically for your use case?*
To give a glimpse of how enormous data is used in Kriging, the following paper might interests you
 [http://www.tandfonline.com/doi/full/10.1080/2150704X.2016.1275053]
The number of points here is 650 million and the size is 18 GB. I think the inversion of variance-covariance matrix C is impossible if it is considered to be processed locally.

*I'm also not clear how common this operation is?*

Kriging is used extensively in many fields like earth science, mining, weather prediction, wireless sensor networks, remote sensing applications like filling gaps in satellite raster images, creating Digital Elevation Model from LiDAR data to name a few and backed by a large number of research papers. There are separate R packages which are implemented solely for Kriging, like gstat, geoR etc. But these are limited to a single node and fail when a large dataset is fed to the system.

Additionally, there have been researches (like [this|https://www.spiedigitallibrary.org/journals/Journal-of-Applied-Remote-Sensing/volume-11/issue-1/016011/High-performance-parallel-approaches-for-three-dimensional-light-detection-and/10.1117/1.JRS.11.016011.short?SSO=1]) going on for parallelizing Kriging in MPI, Hadoop, GPU. One of the teams is [GIST at Oak Ridge national laboratory|http://web.ornl.gov/sci/gist/res_high_performance.shtml], performing geo-computation in HPC setup. I think Spark can easily substitute others for its benefits in this regard. Thus, as a core processing component of Kriging, matrix inversion is highly relevant and a spark implementation will provide a hassle-free solution to a large fraction of the non-computer science researchers.

> Matrix Inversion on BlockMatrix
> -------------------------------
>
>                 Key: SPARK-23266
>                 URL: https://issues.apache.org/jira/browse/SPARK-23266
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>    Affects Versions: 2.2.1
>            Reporter: Chandan Misra
>            Priority: Minor
>
> Matrix inversion is the basic building block for many other algorithms like regression, classification, geostatistical analysis using ordinary kriging etc. A simple Spark BlockMatrix based efficient distributed divide-and-conquer algorithm can be implemented using only *6* multiplications in each recursion level of the algorithm. The reference paper can be found in
> [https://arxiv.org/abs/1801.04723]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org