You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hama.apache.org by Apache Wiki <wi...@apache.org> on 2010/06/27 15:02:25 UTC

[Hama Wiki] Trivial Update of "RoadMap" by Edward J. Yoon

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hama Wiki" for change notification.

The "RoadMap" page has been changed by Edward J. Yoon.
http://wiki.apache.org/hama/RoadMap?action=diff&rev1=15&rev2=16

--------------------------------------------------

+ = Plans for 0.2.0 release =
- = Short-term Issues (for 0.2.0 release) =
- 
- [[http://markmail.org/search/?q=hama-dev+discuss#query:hama-dev%20discuss+page:1+mid:amlvccbptom3yro3+state:results]]
- 
- == Re-factoring issues ==
  
   * Move current code related to matrix operations to the 'examples' package [[https://issues.apache.org/jira/browse/HAMA-243|HAMA-243]]
   * A design of structure of the matrix/graph
- 
- == BSP issues ==
- 
   * Consider more simplified BSP programming interface [[https://issues.apache.org/jira/browse/HAMA-244|HAMA-244]]
   * BSP examples [[https://issues.apache.org/jira/browse/HAMA-221|HAMA-221]]
   * hadoop RPC performance analysis [[https://issues.apache.org/jira/browse/HAMA-245|HAMA-245]]
   * [documentation] Parallel, and Distributed Programming With BSP in Hama [[https://issues.apache.org/jira/browse/HAMA-248|HAMA-248]]
+ ----
  
+ = Plans for 0.3.0 release =
- ----
- = Long-term Issues =
-  
- We have a plan to redesign Hama to be based on BSP model and be specified to shared nothing systems consisting of several thousands commodity servers, which is generally called cloud computing environments.
- == Why BSP? ==
  
+  * Add in/output system
+  * More reliable fault tolerant system
+  * Web-UI monitoring tool of BSP job progress
  
+ And, ...
- In respect of graph package, BSP is also necessary for Hama to process graph data efficiently in shared-nothing architectures. The essence of graph data is connectivities between vertices. During processing, Hama will need not only some vertex's data but also its adjacent vertices' data. Assume that we have a graph data set that partitioned to some cohesive subgraphs. That is, the adjacent vertices can be saved in the same physical storage or near storage as possible. Although we have well-partitioned graphs, MapReduce doesn't exploit its characteristic since it reads input data sequentially and it can’t control its input data. In addition, its partitioner hashes the input data. However, BSP mode can enable graph processing to be performed efficiently while preserving the locality of graph data.
- 
- 
- === Design Considerations ===
-  * Fault Tolerance - Hama aims at running on a several thousands of commodity servers, so it is subject to some fault. In addition, Hama is for large-scale processing that generally takes long time ranging from few minutes to several hours. Therefore, it is important for Hama to finish some given jobs although faults occur during processing. If not, Hama has to restart all jobs.
-  * Heterogeneity - 
-  * Efficiency
-  * Easy to Use
- (Working)
- === TODO ===
-  * A survey on matrix and graph processing algorithms based on BSP programming model.
-  * Developing a fault-tolerant mechanism for BSP model.
-  * Developing a struggling mechanism for BSP model.
-  * Implement BSP frameworks based on the source code that we have done.
-  * A selection of the primitive operations for matrix processing and linear algebra.
-  * Implement the primitive operations for matrix and linear algebra.
-  * Develop operation models based on the primitive operations developed above.
-  * Implement processing framework for matrix and linear algebra.
-  * Design domain-specific language that well reflects to algebraic characteristics.
-  * A selection of the primitive operations for graph processing.
-  * Develop operation models based on the above primitive operation for large-scale graph processing.
- (Working)
  
  = Idea Generating and Research Tasks =