You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2007/12/29 10:00:24 UTC

[Lucene-hadoop Wiki] Update of "Hbase/Matrix" by udanax

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by udanax:
http://wiki.apache.org/lucene-hadoop/Hbase/Matrix

The comment on the change is:
I'll re-write after some transformation.

------------------------------------------------------------------------------
- [[TableOfContents(4)]]
+ deleted
  
-  * It's a designing stage.
- 
- ----
- == LINA, a Framework for Large-scale Sparse Linear Algebra ==
- 
- Using Hbase's Row,Column(Qualifier) two dimensional space, we are able to store large sparse matrix.
- [[BR]]The Auto-partitioned sparsity sub-structure will be efficiently managed and serviced by Hbase.
- 
- Row or Column operations can be done in linear time and algorithms such as structured Gaussian elimination
- [[BR]]or iterative methods run in '''O(~-the number of non-zero elements in the matrix-~)''' time, Hbase Providing Both Vertical and Horizontal access.
- 
- Therefore, using iterative algorithms on a parallel processing platform like !MapReduce on Hadoop, 
- [[BR]]we should be able to implement '''High Performance''' and '''World's Largest''' Matrix Computations.
- 
- === Applications by LINA ===
- 
- It will support a wide variety of applications in the domain of Physics, Linear Algebra, Relational Algebra, 
- [[BR]]Statistics, Graphics Rendering, Computational Dynamics and others.
- 
-  * Scientific simulation and modeling 
-   * Matrix-vector/matrix-matrix multiply 
-   * Soving linear systems 
-   * Collaborative filtering/recommendation systems
-  * Information retrieval 
-   * Sorting 
-   * Finding eigenvalues and eigenvectors 
-  * Computer graphics and computational geometry 
-   * Matrix multiply 
-   * computing matrix determinate 
- 
- === Initial Contributors ===
- 
-  * [:udanax:Edward Yoon] (R&D center, NHN corp.)
-   * I have profited by '''Prof. Samuel Kim''', '''Dr. Yongchan Park'''
- 
- == Storing and manipulating numeric, sparse matrices on Hadoop + Hbase ==
- 
- To store sparse matrix in hbase, I will use an abstracted table with a single column family, and tune existing partitioning conditions. 
- The extendable columnfamily feature combine matrices that have the same row dimension.
- 
- === Scheduling Algorithm ===
- 
- The scheduling algorithm is designed for driving the parallel execution of the factorization on on a MapReduce model.
- 
- NOTE : indirect and irregular, the inefficient data access ....
- 
- NOTE : blocking factor........ code generator...... 
- 
-  * Needs
-   * Total Row/Column Key Count in the matrix table header.
-   * Big floating point number, Big integer calculator
- 
- {{{
-                                               
- +--------------------+               +------+ | +--+ 
- |  .          .      |               |.     | | |. |     
- |       .      .     |               |     .| | | .|     
- |                    |               +------+ | +--+     
- |                    |    ---->   ------------+-----------
- |                    |              +-----+   |+-------+          
- | .  .       .     . |              |.  . |   ||.     .|
- |                    |              +-----+   |+-------+        
- +--------------------+                        |                                            
- 
- }}}
- 
- 
- === Sums on the MapReduce Tasks of Hadoop ===
- 
- == Examples ==
- 
- === Collaborative Filtering Via Ensembles of Matrix Factorizations ===
- 
- No work, no grub.
- 
- === Fast Page-Rank Computation Via Sparse Linear System ===
- 
- No work, no grub.
- 
- === Latent Semantic Analysis Via Singular Value Decomposition ===
- 
- No work, no grub.
- 
- == References ==
- 
-  * High performance numerical libraries in Java, Bjørn-Ove Heimsund
-  * Parallel Conjugate Gradients Assignment, a parallel implementation of the conjugate gradient algorithm
-  * ScaLAPACK, a library of high-performance linear algebra routines for distributed-memory message-passing MIMD computers 
-  * [http://bebop.cs.berkeley.edu/oski/ OSKI], optimized sparse kernel interface (OSKI) library
-   * [http://icl.cs.utk.edu/iclprojects/pages/files/sans/yelick-bebop.pdf Automatic Performance Tuning of Sparse Matrix Kernels], August 7, 2002
-  * Scheduling algorithms for parallel Gaussian elimination with communication costs, Amoura, A.K.; Bampis, E.; Konig, J.-C.
-  * Google's Page-Rank and Beyond: The Science of Search Engine Rankigs, Amy N. Langville; Carl D. Meyer
-