You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2007/08/19 02:22:18 UTC

[Lucene-hadoop Wiki] Update of "Hbase/HbaseShell" by udanax

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by udanax:
http://wiki.apache.org/lucene-hadoop/Hbase/HbaseShell

------------------------------------------------------------------------------
- [[TableOfContents(5)]]
+ [[TableOfContents(4)]]
- ----
- 
- = Hbase Shell Plan Draft =
- Plan is to significantly expand the set of shell operators.  Basic data manipulation and data definition operators will be extended and evolved to be more SQL-like ([wiki:Hbase/HbaseShell/HQL HQL]).  More sophisticated manipulations to do relational and linear algebra, matrix additions, multiplications, etc., will be added to a HBase subshell to keep the two operator types -- SQL-like vs. non-SQL -- distinct.
- 
-  ''-- After POC(proof of concept) review, many things can change.[[BR]]-- If you have constructive ideas, Please advise me. [[MailTo(webmaster AT SPAMFREE udanax DOT org)]]''
- 
- This project is currently in the planning stage.  [https://issues.apache.org/jira/browse/HADOOP-1608 HADOOP-1608] to add "Relational Algrebra Operators" is currently in process.
  
  ----
+ = Hbase Shell Introduction =
+ Hbase Shell is a basic, command-line, and interactive 'shell' for manipulating tables in Hbase. It has support for a small set of SQL-inspired operations. Results are presented in an ASCII-table format.
  
- == Suggested Hbase Shell altools plans ==
- I suggest to develop HBase Shell in SQL-style, and develop '''al'''gebraic '''tools''' as a sub shell in Intuitionalized-style as described below. 
+ The Hbase Shell aims to be to Hbase what the mysql client command-line tool is to mysqld, and what sqlplus to Oracle.
+ 
+ Hbase Shell was first added to TRUNK in July, 2007.
+ 
+  * [http://issues.apache.org/jira/browse/hadoop-1720 HADOOP-1720] to update "[wiki:Hbase/HbaseShell/HQL HQL]" is currently in process. 
+  * See [wiki:Hbase/ShellPlans Hbase Shell plans] page for discussion and description of future operators. The intent is to add more support for non-interactive usage as well as operators for algebraic, relational, and matrix manipulations. 
+ 
+ == People Involved ==
+  * [:udanax:Edward Yoon] [[MailTo(webmaster AT SPAMFREE udanax DOT org)]] (Research and Development center, NHN corp.) -- Initial contributor
+  * [:InchulSong:Inchul Song] [[MailTo(icsong AT SPAMFREE gmail DOT com)]] (Database Lab, KAIST)
+ 
+ ----
+ = How to Start a Shell =
+ Run the following on the command-line:
+ 
+ {{{${HBASE_HOME}/bin/hbase shell}}}
+ 
+ You will be presented with the following prompt:
+ 
+ {{{HBase Shell, 0.0.1 version.
+ Copyright (c) 2007 by udanax, licensed to Apache Software Foundation.
+ Type 'help;' for usage.
+ 
+ HBase >}}}
+ 
+ All commands are terminated with a semi-colon: e.g. Type 'help;' to see list of available commands.
+ 
+ = Hbase Shell Commands =
+ '''Note''' that Data should be located by their row, column, and timestamp.
+ 
+ ||<bgcolor="#ececec">'''Command''' ||<bgcolor="#ececec">'''Explanation''' ||
+ ||Help ||<99%>'''Help''' command provides information about the use of shell script.[[BR]][[BR]]~-''HELP [function_name];''-~ ||
+ ||Show ||<99%>'''Show''' command lists tables ''or files (DFS)''.[[BR]][[BR]]~-''SHOW tables[ or files];''-~ ||
+ ||Describe ||'''Describe''' command provides information about the columnfamilies in a table.[[BR]][[BR]]~-''DESC table_name;''-~ ||
+ ||Create ||'''Create''' command creates a new table.[[BR]][[BR]]~-''CREATE table_name[[BR]]COLUMNFAMILIES('columnfamily_name1'[, 'columnfamily_name2', ...])[[BR]][LIMIT=limitNumber_of_Version];''-~ ||
+ ||Drop ||'''Drop''' command drops columnfamilies in a table or tables.[[BR]][[BR]]~-''DROP table_name1[, table_name2, ...] or columnfamily_name1[, columnfamily_name2, ...];''-~ ||
+ ||Clear ||<99%>'''Clear''' the screen.[[BR]][[BR]]~-''CLEAR;''-~ ||
+ ||Exit ||<99%>'''Exit''' from the current shell script.[[BR]][[BR]]~-''EXIT;''-~ ||
+ And, Commands to manually manipulate data on more detailed parts.
+ ||<bgcolor="#ececec">'''Command''' ||<bgcolor="#ececec">'''Explanation''' ||
+ ||Insert ||<99%>'''Insert''' command inserts one row into the table with a value for specified column in the table.[[BR]][[BR]]~-''INSERT table_name ('columnfamily_name1:column_key'[, 'columnfamily_name2:column_key', ...])[[BR]] VALUESVALUES ('entry1'[, 'entry2', ...])[[BR]]WHERE row='row_key';''-~ ||
+ ||Delete ||'''Delete''' command deletes specified rows in table. [[BR]][[BR]]~-''DELETE table_name[[BR]]WHERE row='row_key'[[BR]][AND column='columnfamily_name:column_key'];''-~ ||
+ ||Select ||<99%>'''Select''' command retrieves rows from a table.[[BR]][[BR]]~-''SELECT table_name[[BR]][WHERE row='row_key'][[BR]][AND column='columnfamily_name:column_key'];[[BR]][AND time='Specified_Timestamp'];[[BR]][LIMIT=Number_of_Version];''-~ ||
+ 
+ ----
+ = Example Of Hbase Shell Use =
+ == Create the table in a HBase ==
  
  {{{
- HBase > altools;
+ HBase > CREATE movieLog_table
+     --> COLUMNFAMILIES('year', 'length', 'inColor', 'studioName', 'vote', 'producer')
+     --> LIMIT=1; 
  
+ HBase > CREATE movieStar_table
+     --> COLUMNFAMILIES('biography', 'filmography', 'gender', 'birthDate')
+     --> LIMIT=1;
- Hbase altools, 0.0.1 version
- Type 'help;' for Hbase altools usage.
- 
- Hbase.altools > who are you;
- 
-  Hadoop + Hbase based algebraic manipulation tools
- 
- Hbase.altools > exit;
- Hbase > exit;
  }}}
  
- Hbase altools is an Hbase Shell sub 'interpreter' (or 'shell)' program to provide scalable data processing capabilities like  aggregation, algebraic calculation(groups and sets, commutative rings, algebraic geometry, and linear algebra) on Hadoop + Hbase based parallel machines. especially, it will focus on storing and manipulating very large sparse matrices on Hbase.
+ == Insert data into a table ==
+ {{{
+ HBase > INSERT movieLog_table ('year:', 'length:', 'inColor:', 'studioName:', 'vote:user_1', 'producer:')
+     --> VALUES ('1977', '124', 'true', 'Fox', '5', 'George Lucas')
+     --> WHERE row='Star Wars';
  
-  ''-- Altools Matrix operations will show how Google search's LSI, Google Earth's algebraic topology, Google News' recommendation system are related to Bigtable.''
  
- === Background ===
- I expect Hadoop + Hbase to handle sparsity and data explosion very well in near future. Moreover, i believe the design of the multi-dimensional map structure and the 3d space model of the data are optimized for rapid ad-hoc information retrieval in any orientation, as well as for fast, flexible calculation and transformation of raw data based on formulaic relationships. It is advantageous with respect to Analysis Processing as it allows users to easily formulate complex queries, and filter or slice data into meaningful subsets, among other things.
+ HBase > INSERT movieStar_table ('biography:', 'filmography:Star Wars', 'gender:', 'birthDate:')
+     --> VALUES ('blah~', 'starring', 'male', 'March 31, 1971')
+     --> WHERE row='Ewan Gordon Mc.Gregor'; 
+ }}}
  
- ----
+ == Show all data in a table ==
+ {{{
+ HBase > SELECT movieLog_table;
+ }}}
  
+ ||Row Key ||<-12>Column Families ||
+ ||<rowbgcolor="#ececec">title ||<-2> year ||<-2>length ||<-2>inColor ||<-2> studioName ||<-2> vote ||<-2> producer ||
+ ||Star Wars ||year: || 1977 ||length: || 124 ||inColor: || true ||studioName: || Fox || vote:''user_1'' || 5 || producer: || George Lucas ||
+ || || || || || || || || || || vote:''user_2'' || 2 || || ||
+ ||Mighty Ducks ||year: || 1991 ||length: || 104 ||inColor: || true ||studioName: || Disney || vote:''user_1'' || 2 || producer: || Blair Peters ||
+ || || || || || || || || || || vote:''user_3'' || 4 || || ||
+ ||Wayne's World ||year: || 1992 ||length: || 95 ||inColor: || true ||studioName: || Paramount || vote:''user_2'' || 3 || producer: || Penelope Spheeris ||
+ || || || || || || || || || || vote:''user_3'' || 4 || || ||
- == Suggested Hbase altools Syntax ==
- '''Note''' that Data should be located by their row, column, and timestamp.
- 
- === Commands ===
- ||<bgcolor="#E5E5E5">'''Command''' ||<bgcolor="#E5E5E5">'''Explanation''' ||
- ||Table ||<99%>'''Table''' command loads specified table. [[BR]][[BR]]~-''Table('movieLog_table');''-~ ||
- ||Matrix ||<99%>'''Matrix''' command constructs the configuration of the logic matrix.[[BR]]'''Options''' : features not yet. [[BR]][[BR]]~-''Matrix(table_name, columnfamily_name[, option]);''-~ ||
- ||Substitute ||<99%>'''Substitute''' expression to [A~Z][[BR]][[BR]]~-''A = Table('movieLog_table');''-~ ||
- ||IF...ELSE ||<99%>'''IF...ELSE''', Imposes conditions on the execution. [[BR]][[BR]]~-''IF ( boolean_expression )[[BR]]B = command_statements;[[BR]]ELSE[[BR]]B = command_statements;''-~||
- ||Store ||<99%>'''Store''' command will store results to specified table. [[BR]][[BR]]~-''A = Table('movieLog_table'); [[BR]]B = A.Selection(length > 100); [[BR]]Store B TO table('tmp_table')[or file('backup.dat')];''-~ ||
- 
- === Relational Operators ===
- ||<bgcolor="#E5E5E5">'''Operator''' ||<bgcolor="#E5E5E5">'''Explanation''' ||
- ||Projection ||<99%>'''Projection''' of a relation ~+R+~, It makes a new relation as the set that is obtained when all tuples(rows) in ~+R+~ are restricted to the set {columnfamily,,1,,,...,columnfamily,,n,,}.[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B = A.Projection('year','length'); '''//π,,year.length,,(A)''' ''-~ ||
- ||Selection ||<99%>'''Selection''' of a relation ~+R+~, It makes a new relation as the set of specified tuples(rows) of the relation ~+R+~.[[BR]]'''Set Operations''' : ~-''OR, AND, NOT''-~[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B = A.Selection(length > 100 AND studioName = 'Fox'); '''//σ,,length > 100.studioName='Fox',,(A)''' ''-~ ||
- ||JOINs ||<99%>Table '''JOIN''' operations, linking and extracting data from two different internal source.[[BR]]'''Operations''' : ~-''naturalJoin(), thetaJoin(), cartesianProduct() ''-~ [[BR]][[BR]]~-''R = Table('movieLog_table');[[BR]]S = Table('movieStar_table');[[BR]]C = R.naturalJoin(S); '''//C = R▷◁S''' ''-~ ||
- ||Group ||<99%>'''Group''' tuples by value of an attribute and apply aggregate function independently to each group of tuples.[[BR]]'''Aggregate Functions''' : ~-''AVG( attribute ), SUM( attribute ), COUNT( attribute ), MIN( attribute ), MAX( attribute )''-~[[BR]][[BR]]~-''A = Table('movieLog_table);[[BR]]B = A.Group('studioName', MIN('year')); '''//γ,,studioName.MIN( year ),,(A)''' ''-~ ||
- ||Sort ||<99%>'''Sort''' of tuples(rows) of R, ordered according to columnfamilies on columnfamily-list.[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B = Sort A by ('length'); '''//τ,,length,,(A)''' ''-~ ||
- 
- '''(ex. 1)''' Search the subject and the year of the movies which were produced by 'Fox' company and where running time is more than 100 minutes.
- [[BR]]~-''π ,,title.year,, (σ ,,length > 100,, (movieLog_table) ∩ σ ,,studioName = 'Fox',, (movieLog_table))''-~
  
  {{{
+ HBase > SELECT movieStar_table;
- Hbase.altools > A = Table('movieLog_table'); 
- Hbase.altools > B = A.Selection(length > 100 AND studioName = 'Fox'); 
- Hbase.altools > C = B.Projection('year'); 
- 
- Hbase.altools > store C to table('result_table'); 
  }}}
  
- '''(ex. 2)''' Theta Join : ▷◁,,C,,
- [[BR]]~-''movieStars_table▷◁,,actor < year,,movieLog_table''-~
+ ||Row Key ||<-8>Column Families ||
+ ||<rowbgcolor="#ececec">starName ||<-2> biography ||<-2>filmography ||<-2>gender ||<-2> birthDate ||
+ ||Ewan Gordon Mc.Gregor ||biography: ||blah blah ||filmography:Star Wars ||starring ||gender: ||male ||birthDate: ||March 31, 1971 ||
+ || || || ||filmography:Emma ||extra || || || || ||
+ ||Kenan Thompson ||biography: ||blah blah ||filmography:Mighty Ducks ||starring ||gender: ||male ||birthDate: ||May 10, 1978 ||
+ || || || ||filmography:Big Fat Liar  ||cameo || || || || ||
+ ||keanu reeves ||biography: ||blah blah ||filmography:Constantine ||starring ||gender: ||male ||birthDate: ||September 2, 1964||
+ || || || ||filmography:The Matrix Reloaded ||starring || || || || ||
  
- {{{
- Hbase.altools > A = Table('movieStars_table'); 
- Hbase.altools > B = Table('movieLog_table');
- Hbase.altools > C = A.thetaJoin(B);
- 
- Hbase.altools > store C to table('result_table'); 
- }}}
- 
- === Matrix Arithmetic Operators ===
- ||<bgcolor="#E5E5E5">'''Operator''' ||<bgcolor="#E5E5E5">'''Explanation''' ||
- ||Addition ||<99%>'''Adding''' entries with the same indices. [[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = Matrix('m_table','cf_2');[[BR]]C = A + B; '''// c,,ij,, = a,,ij,, + b,,ij,, (i : row key, j : column key)''' ''-~ ||
- ||Subtraction ||<99%>'''Subtracting''' entries with the same indices.[[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = Matrix('m_table','cf_2');[[BR]]C = A - B; '''// c,,ij,, = a,,ij,, - b,,ij,, (i : row key, j : column key)''' ''-~ ||
- ||Multiplication ||<99%>'''Multiplication''' of two matrices, Product C of two matrices A and B.[[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = Matrix('m_table','cf_2');[[BR]]C = A * B; '''//C = A · B''' ''-~ ||
- ||Division ||<99%>'''Division''' is solving the matrix equation AX = B for X.[[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = Matrix('m_table','cf_2');[[BR]]C = A /[or \] B; '''// C = A / B''' ''-~||
- ||Transpose ||<99%>'''Transpose''' of a Matrix, A matrix which is formed by turning all the rows of a given matrix into columns and vice-versa.[[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = Transpose(A); '''// B = A'''' ''-~||
- 
- '''(ex. 1)''' The product C of two matrices A and B
- [[BR]]~-''C,,ij,, = ΣA,,ik,,B,,kj,, (1 ≤ i ≤ m , 1 ≤ j ≤n)''-~
- 
- {{{
- Hbase.altools > A = Matrix('m_table','cf_1');
- Hbase.altools > B = Matrix('m_table','cf_2');
- Hbase.altools > C = A * B;  
- }}}
- 
- === Factorizations and Decompositions ===
- 
- ||<bgcolor="#E5E5E5">'''Function''' ||<bgcolor="#E5E5E5">'''Explanation''' ||
- ||LU ||<99%>'''LU Decomposition'''[[BR]]A procedure for decomposing an N by N matrix A into a product of a lower triangular matrix L and an upper triangular matrix U, LU = A.[[BR]]'''Functions''' : ~-''getL(), getU(), isSingular(), getPivot()''-~ [[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = LUDecomposition(A);[[BR]]C = getU(B);[[BR]]D = getL(A);''-~||
- ||QR ||<99%>'''QR Decomposition'''[[BR]]For an m-by-n matrix A with m >= n, the QR decomposition is an m-by-n orthogonal matrix Q and an n-by-n upper triangular matrix R so that A = Q*R.[[BR]]'''Functions''' : ~-''getH(), getQ(), getR()''-~[[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = QRDecomposition(A);[[BR]]C = getH(B);''-~||
- ||Cholesky ||<99%>'''Cholesky Decomposition'''[[BR]]It is a special case of LU decomposition applicable only if matrix to be decomposed is symmetric positive definite.[[BR]]'''Functions''' : ~-''getL(), isSPD()''-~ [[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = CholeskyDecomposition(A);[[BR]]C = getL(A);''-~||
- ||SVD ||<99%>'''SV(Singular Value) Decomposition'''[[BR]]For an m-by-n matrix A with m >= n, the singular value decomposition is an m-by-n orthogonal matrix U, an n-by-n diagonal matrix S, and an n-by-n orthogonal matrix V so that A = U*S*V'.[[BR]]'''Functions''' : ~-''getS(), getU(), getV(), getSingularValues()''-~ [[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = SVDecomposition(A);[[BR]]C = getU(B);''-~||
- 
- '''(ex. 1)''' To find the Singular Value decomposition in Altools, do the following:
- [[BR]]~-''M = UΣV*''-~
- 
- {{{
- Hbase.altools > M = Matrix('m_table','cf_1'); //Set up the matrix M from mapped matrix in hbase.
- Hbase.altools > U = M.getU();
- Hbase.altools > V = M.getV();
- }}}
-