You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2007/08/08 22:17:00 UTC

[Lucene-hadoop Wiki] Trivial Update of "Hbase/ShellPlans" by stack

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by stack:
http://wiki.apache.org/lucene-hadoop/Hbase/ShellPlans

The comment on the change is:
Added some introductory text.   Added a couple of comments.

------------------------------------------------------------------------------
  ----
  
  = Hbase Shell Plan Draft =
+ Plan is to significantly expand the set of shell operators.  Basic data manipulation and data definition operators will be extended and evolved to be more SQL-like ([:Hbase/HbaseShell/HQL HQL]).  More sophisticated manipulations to do relational and linear algebra, matrix additions, multiplications, etc., will be added to a HBase subshell to keep the two operator types -- SQL-like vs. non-SQL -- distinct.
+ 
+ This project is currently in the planning stage.  [https://issues.apache.org/jira/browse/HADOOP-1608 HADOOP-1608] to add "Relational Algrebra Operators" is currently in process.
+ 
  == People Involved ==
   * '''Syntax definition.'''
    * [:udanax:Edward Yoon], Master.[[BR]]Open Collaboration, NHN corp.
    * Inchul Song, Ph.D. Candidate[[BR]]Database Lab[[BR]]Division of Computer Science, KAIST
  
  If you have constructive ideas, please advise me. webmaster@udanax.org
+ 
+ ''~-This page looks great. I've added comments to the below.  Please remove after you are done with them. -- St.Ack-~''
  
  == Suggested Hbase Shell plans ==
  === Hbase Query Language ===
@@ -44, +50 @@

   * A Simplified processing of a logical data model
   * A Simplified algebraic operations
   * A Simplified Parallel Numerical Analysis by abstracting/numericalizing points, lines, or plane data across multiple maps in HBase.
+ 
+  ''~-Does the import/export above include being able to write HQL/altool scripts feeding them to the interpreter on stdin or passing the interpreter a file of script? It would be sweet too if the interpreter could be invoked with a flag which stated how results were to be output.  ACSII tables could be the default as it is now but users will likely want output without formatting or output formatted as XML, etc.  Something to think about.  Also, Edward, I'd suggest that you would be doing yourself a service if you added citations for concepts like 'Parallel Numerical Analysis'.  It will help folks like myself does not know what this means.  Thanks. -- St.Ack -~''
  === HBase altools Background ===
  I expect Hadoop + Hbase to handle sparsity and data explosion very well in near future. Moreover, i believe the design of the multi-dimensional map structure and the 3d space model of the data are optimized for rapid ad-hoc information retrieval in any orientation, as well as for fast, flexible calculation and transformation of raw data based on formulaic relationships. It is advantageous with respect to '''Analysis Processing'''  as it allows users to easily formulate complex queries, and filter or slice data into meaningful subsets, among other things.
  
@@ -63, +71 @@

  
  If we decide to cut the data model in time version, then we may view the new data as a 2D table. If index is in string, we may view it as a huge map. If index is in integer, then it is one huge 2D array. So each table may have such data storages in 3D (Columnfamilies) Locality Group(Columnfamilies) is a relationship that can occur between multiple references whenever one reference brings in much of the data used by the other references.
  
+ ''~-I think people may also start to ask as your operators evolve: 'What is the difference between HBase Shell and Yahoo! PIG?' -- St.Ack-~''
+ 
  ----
  
  === Suggested Hbase altools Syntax ===
@@ -74, +84 @@

  ||Matrix ||<99%>'''Matrix''' command constructs the configuration of the logic matrix. [[BR]][[BR]]~-''M = Matrix(table_name, columnfamily_name[, scalar S]);''-~ ||
  ||Substitute ||<99%>'''Substitute''' expression to [A~Z][[BR]][[BR]]~-''A = Table('movieLog_table');''-~ ||
  ||Store ||<99%>'''Store''' command will store results to specified table. [[BR]][[BR]]~-''A = Table('movieLog_table'); [[BR]]B = A.Selection(length > 100); [[BR]]Store B TO table('tmp_table')[or file('backup.dat')];''-~ ||
+ 
+ ''~-Do you intend to allow multi-character variables?  I think you should if possible. -- St.Ack-~''
+ 
  ==== Relational Operators ====
  ||<bgcolor="#E5E5E5">'''Operator''' ||<bgcolor="#E5E5E5">'''Explanation''' ||
  ||Projection ||<99%>'''Projection''' of a relation ~+R+~, It makes a new relation as the set that is obtained when all tuples(rows) in ~+R+~ are restricted to the set {columnfamily,,1,,,...,columnfamily,,n,,}.[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B = A.Projection('year','length');''-~ ||
  ||Selection ||<99%>'''Selection''' of a relation ~+R+~, It makes a new relation as the set of specified tuples(rows) of the relation ~+R+~[[BR]]'''Set Operations''' : ~-''OR, AND, NOT''-~[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B = A.Selection(length > 100 AND studioName = 'Fox');''-~ ||
  ||Group ||<99%>'''Group''' tuples by value of an attribute and apply aggregate function independently to each group of tuples.[[BR]]'''Aggregate Functions''' : ~-''AVG( attribute ), SUM( attribute ), COUNT( attribute ), MIN( attribute ), MAX( attribute )''-~[[BR]][[BR]]~-''A = Table('movieLog_table);[[BR]]B = A.Group('studioName', MIN('year'));''-~ ||
  ||Sort ||<99%>'''Sort''' of tuples(rows) of R, ordered according to columnfamilies on columnfamily-list[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B = Sort A by ('length');''-~ ||
+ 
+ ''~-Again, to help readers, you might cite pages that explain 'relational algebra' or examples of its use in databases to help contextualize your plan (Aren't there other relational operators than these that might be included?   Do you intend to implement those?  If not, you might say why not of if you intend to do these as 'Matrix Arithmetic Operators, you might say so. -- St.Ack-~''
  
  ==== Matrix Arithmetic Operators ====
  ||<bgcolor="#E5E5E5">'''Operator''' ||<bgcolor="#E5E5E5">'''Explanation''' ||
@@ -100, +115 @@

  ||SVD ||<99%>'''SV(Singular Value) Decomposition'''[[BR]]For an m-by-n matrix A with m >= n, the singular value decomposition is an m-by-n orthogonal matrix U, an n-by-n diagonal matrix S, and an n-by-n orthogonal matrix V so that A = U*S*V'.[[BR]]'''Functions''' : ~-''getS(), getU(), getV(), getSingularValues()''-~ [[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = SVDecomposition(A);[[BR]]C = getU(B);''-~||
  ----
  = Implementation =
+ 
+ ''~-I'd suggest you add a section on the framework you intend to use developing operators (Will you be using javacc to parse the shell inputs?  Will some operators run mapreduce jobs?  Will output always be ASCII or will it specifiable?). -- St.Ack-~''
  
  '''Note''' : ''We should first test on local machines. -- udanax''
  [[BR]] ''Code Style Formatter'' [attachment:uncle-jim-code-style.xml]