You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@phoenix.apache.org by ja...@apache.org on 2014/04/20 20:23:07 UTC

svn commit: r1588814 - in /incubator/phoenix/site: publish/pig_integration.html source/src/site/markdown/pig_integration.md

Author: jamestaylor
Date: Sun Apr 20 18:23:06 2014
New Revision: 1588814

URL: http://svn.apache.org/r1588814
Log:
Added documentation for Pig Loader (RaviMagham)

Modified:
    incubator/phoenix/site/publish/pig_integration.html
    incubator/phoenix/site/source/src/site/markdown/pig_integration.md

Modified: incubator/phoenix/site/publish/pig_integration.html
URL: http://svn.apache.org/viewvc/incubator/phoenix/site/publish/pig_integration.html?rev=1588814&r1=1588813&r2=1588814&view=diff
==============================================================================
--- incubator/phoenix/site/publish/pig_integration.html (original)
+++ incubator/phoenix/site/publish/pig_integration.html Sun Apr 20 18:23:06 2014
@@ -1,7 +1,7 @@
 
 <!DOCTYPE html>
 <!--
- Generated by Apache Maven Doxia at 2014-04-02
+ Generated by Apache Maven Doxia at 2014-04-20
  Rendered using Reflow Maven Skin 1.1.0 (http://andriusvelykis.github.io/reflow-maven-skin)
 -->
 <html  xml:lang="en" lang="en">
@@ -142,7 +142,23 @@ STORE A into 'hbase://CORE.ENTITY_HISTOR
 </div> 
 <div class="section"> 
  <h2 id="Pig_Loader">Pig Loader</h2> 
- <p>A Pig data loader is not yet implemented, but there is work in progress tracked by <a class="externalLink" href="https://issues.apache.org/jira/browse/PHOENIX-11">this</a> JIRA.</p> 
+ <p>A Pig data loader allows users to read data from Phoenix backed HBase tables within a Pig script. </p> 
+ <p>The Load func provides two alternative ways to load data. 1. Given a Table Name A = load ‘<a class="externalLink" href="hbase://table/HIRES">hbase://table/HIRES</a>’ using org.apache.phoenix.pig.PhoenixHBaseLoader(‘localhost’);</p> 
+ <div class="source"> 
+  <pre>The above loads the data for all the columns in HIRES table.
+To restrict the list of columns , you can specify the column names as part of LOAD as below
+   A = load 'hbase://table/HIRES/ID,NAME'  using org.apache.phoenix.pig.PhoenixHBaseLoader('localhost');
+ Here, only data for ID and NAME columns are returned.
+</pre> 
+ </div> 
+ <ol style="list-style-type: decimal"> 
+  <li>Given a Query A = load ‘<a class="externalLink" href="hbase://query/SELECT">hbase://query/SELECT</a> ID,NAME FROM HIRES WHERE AGE &gt; 50’ using org.apache.phoenix.pig.PhoenixHBaseLoader(‘localhost’); The above query loads data of all those rows whose AGE column has a value &gt; 50 . The LOAD func merely executes the given SQL query and returns the results. Though there is a provision to provide a query as part of LOAD, it is more restrictive to the following a) Should be a SELECT query only. b) Shouldn’t contain any GROUP BY , ORDER BY , LIMIT , DISTINCT clauses within the query. c) Shouldn’t contain any of AGGREGATE functions.</li> 
+ </ol> 
+ <p>In both the cases, the zookeeper quorum should be passed to the PhoenixHBaseLoader as an argument to the constructor. </p> 
+ <p>The Loadfunc makes best effort to map Phoenix Data Types to Pig datatype. You can have a look at org.apache.phoenix.pig.util.TypeUtil to see how each of Phoenix data type is mapped to Pig data type.</p> 
+ <p>TODOS: With Phoenix 3.0 , we provide support for a ARRAY data type. However , this is not yet supported within Pig Loader. Usage of String, Date functions within the provided SQL Query.</p> 
+ <p>Example Usage: Goal : Determine the number of users by a CLIENT ID. Ddl: CREATE TABLE HIRES( CLIENTID INTEGER NOT NULL, EMPID INTEGER NOT NULL, NAME VARCHAR CONSTRAINT pk PRIMARY KEY(CLIENTID,EMPID)); Pig Script:</p> 
+ <p>raw = LOAD ‘<a class="externalLink" href="hbase://table/HIRES">hbase://table/HIRES</a> USING org.apache.phoenix.pig.PhoenixHBaseLoader(‘localhost’)’; grpd = GROUP raw BY CLIENTID; cnt = FOREACH grpd GENERATE group AS CLIENT,COUNT(raw); DUMP cnt; </p> 
 </div>
 			</div>
 		</div>

Modified: incubator/phoenix/site/source/src/site/markdown/pig_integration.md
URL: http://svn.apache.org/viewvc/incubator/phoenix/site/source/src/site/markdown/pig_integration.md?rev=1588814&r1=1588813&r2=1588814&view=diff
==============================================================================
--- incubator/phoenix/site/source/src/site/markdown/pig_integration.md (original)
+++ incubator/phoenix/site/source/src/site/markdown/pig_integration.md Sun Apr 20 18:23:06 2014
@@ -21,4 +21,39 @@ It is advised that the upsert operation 
 For example, let’s assume we are writing records n1...n10 to HBase. If the job fails in the middle of this process, we are left in an inconsistent state where n1...n7 made it to the phoenix tables but n8...n10 were missed. If we retry the same operation, n1...n7 would be re-upserted and n8...n10 would be upserted this time.
 
 ##Pig Loader
-A Pig data loader is not yet implemented, but there is work in progress tracked by [this](https://issues.apache.org/jira/browse/PHOENIX-11) JIRA.
+A Pig data loader allows users to read data from Phoenix backed HBase tables within a Pig script. 
+
+The Load func provides two alternative ways to load data.
+ 1. Given a Table Name
+      A = load 'hbase://table/HIRES'  using org.apache.phoenix.pig.PhoenixHBaseLoader('localhost');
+	  
+	The above loads the data for all the columns in HIRES table.
+    To restrict the list of columns , you can specify the column names as part of LOAD as below
+       A = load 'hbase://table/HIRES/ID,NAME'  using org.apache.phoenix.pig.PhoenixHBaseLoader('localhost');
+     Here, only data for ID and NAME columns are returned.
+
+ 2. Given a Query	 
+      A = load 'hbase://query/SELECT ID,NAME FROM HIRES WHERE AGE > 50' using org.apache.phoenix.pig.PhoenixHBaseLoader('localhost');
+	The above query loads data of all those rows whose AGE column has a value > 50 . The LOAD func merely executes the given SQL query and returns the results. 
+	Though there is a provision to provide a query as part of LOAD, it is more restrictive to the following
+	a) Should be a SELECT query only.
+	b) Shouldn't contain any GROUP BY , ORDER BY , LIMIT , DISTINCT clauses within the query.
+	c) Shouldn't contain any of AGGREGATE functions.
+	
+  In both the cases, the zookeeper quorum should be passed to the PhoenixHBaseLoader as an argument to the constructor.	
+  
+  The Loadfunc makes best effort to map Phoenix Data Types to Pig datatype. You can have a look at org.apache.phoenix.pig.util.TypeUtil to see how each of Phoenix data type is mapped to Pig data type.
+  
+  TODOS:
+     With Phoenix 3.0 , we provide support for a ARRAY data type. However , this is not yet supported within Pig Loader.
+     Usage of String, Date functions within the provided SQL Query.
+	 
+  Example Usage:
+   Goal : Determine the number of users by a CLIENT ID.
+   Ddl: CREATE TABLE HIRES( CLIENTID INTEGER NOT NULL, EMPID INTEGER NOT NULL, NAME VARCHAR CONSTRAINT pk PRIMARY KEY(CLIENTID,EMPID));
+   Pig Script:
+   
+   raw = LOAD 'hbase://table/HIRES USING org.apache.phoenix.pig.PhoenixHBaseLoader('localhost')';
+   grpd = GROUP raw BY CLIENTID; 
+   cnt = FOREACH grpd GENERATE group AS CLIENT,COUNT(raw);
+   DUMP cnt;  
\ No newline at end of file