You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hawq.apache.org by yo...@apache.org on 2016/10/31 22:13:40 UTC
[30/50] incubator-hawq-docs git commit: use numbered steps throughout

use numbered steps throughout


Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/f9c340af
Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/f9c340af
Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/f9c340af

Branch: refs/heads/tutorial-proto
Commit: f9c340af7fc117da341e63689262fa3e6b2da0da
Parents: cd6d034
Author: Lisa Owen <lo...@pivotal.io>
Authored: Tue Oct 25 14:02:27 2016 -0700
Committer: Lisa Owen <lo...@pivotal.io>
Committed: Tue Oct 25 14:02:27 2016 -0700

----------------------------------------------------------------------
 pxf/HDFSFileDataPXF.html.md.erb | 321 +++++++++++++++++++----------------
 1 file changed, 174 insertions(+), 147 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/f9c340af/pxf/HDFSFileDataPXF.html.md.erb
----------------------------------------------------------------------
diff --git a/pxf/HDFSFileDataPXF.html.md.erb b/pxf/HDFSFileDataPXF.html.md.erb
index 06f8b4d..a853e3e 100644
--- a/pxf/HDFSFileDataPXF.html.md.erb
+++ b/pxf/HDFSFileDataPXF.html.md.erb
@@ -47,7 +47,7 @@ Perform the following steps to create data files used in subsequent exercises:
 1. Create an HDFS directory for PXF example data files:
 
     ``` shell
-     $ sudo -u hdfs hdfs dfs -mkdir -p /data/pxf_examples
+    $ sudo -u hdfs hdfs dfs -mkdir -p /data/pxf_examples
     ```
 
 2. Create a delimited plain text file:
@@ -148,36 +148,39 @@ Use the `HdfsTextSimple` profile when reading plain text delimited or .csv files
 |-------|-------------------------------------|
 | delimiter    | The delimiter character in the file. Default value is a comma `,`.|
 
+### <a id="profile_hdfstextsimple_query"></a>Query With HdfsTextSimple Profile
 
-The following SQL call uses the PXF `HdfsTextSimple` profile to create a queryable HAWQ external table from the `pxf_hdfs_simple.txt` file you created and added to HDFS in an earlier section:
+Perform the following steps to create and query external tables accessing the `pxf_hdfs_simple.txt` file you created and added to HDFS in an earlier section.
 
-``` sql
-gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple(location text, month text, num_orders int, total_sales float8)
-            LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple')
-          FORMAT 'TEXT' (delimiter=E',');
-gpadmin=# SELECT * FROM pxf_hdfs_textsimple;          
-```
+1. Use the `HdfsTextSimple` profile to create a queryable HAWQ external table from the `pxf_hdfs_simple.txt` file you created and added to HDFS in an earlier section:
 
-``` pre
-   location    | month | num_orders | total_sales 
----------------+-------+------------+-------------
- Prague        | Jan   |        101 |     4875.33
- Rome          | Mar   |         87 |     1557.39
- Bangalore     | May   |        317 |     8936.99
- Beijing       | Jul   |        411 |    11600.67
-(4 rows)
-```
+    ``` sql
+    gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple(location text, month text, num_orders int, total_sales float8)
+                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple')
+              FORMAT 'TEXT' (delimiter=E',');
+    gpadmin=# SELECT * FROM pxf_hdfs_textsimple;          
+    ```
 
-Create a second external table from `pxf_hdfs_simple.txt`, this time using the `CSV` `FORMAT`:
+    ``` pre
+       location    | month | num_orders | total_sales 
+    ---------------+-------+------------+-------------
+     Prague        | Jan   |        101 |     4875.33
+     Rome          | Mar   |         87 |     1557.39
+     Bangalore     | May   |        317 |     8936.99
+     Beijing       | Jul   |        411 |    11600.67
+    (4 rows)
+    ```
 
-``` sql
-gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple_csv(location text, month text, num_orders int, total_sales float8)
-            LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple')
-          FORMAT 'CSV';
-gpadmin=# SELECT * FROM pxf_hdfs_textsimple_csv;          
-```
+2. Create a second external table from `pxf_hdfs_simple.txt`, this time using the `CSV` `FORMAT`:
 
-Notice: When specifying `FORMAT 'CSV'` for a comma-separated value file, no `delimiter` formatter option is required, as comma is the default.
+    ``` sql
+    gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple_csv(location text, month text, num_orders int, total_sales float8)
+                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple')
+              FORMAT 'CSV';
+    gpadmin=# SELECT * FROM pxf_hdfs_textsimple_csv;          
+    ```
+
+    Notice: When specifying `FORMAT 'CSV'` for a comma-separated value file, no `delimiter` formatter option is required, as comma is the default.
 
 ## <a id="profile_hdfstextmulti"></a>HdfsTextMulti Profile
 
@@ -189,30 +192,39 @@ Use the `HdfsTextMulti` profile when reading plain text files with delimited sin
 |-------|-------------------------------------|
 | delimiter    | The delimiter character in the file. |
 
-The following SQL call uses the PXF `HdfsTextMulti` profile to create a queryable HAWQ external table from the `pxf_hdfs_multi.txt` file you created and added to HDFS in an earlier section:
+### <a id="profile_hdfstextmulti_query"></a>Query With HdfsTextMulti Profile
 
-``` sql
-gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textmulti(address text, month text, year int)
-            LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_multi.txt?PROFILE=HdfsTextMulti')
-          FORMAT 'CSV' (delimiter=E':');
-gpadmin=# SELECT * FROM pxf_hdfs_textmulti;
-```
+Perform the following operations to create and query an external HAWQ table accessing the `pxf_hdfs_multi.txt` file you created and added to HDFS in an earlier section.
 
-``` pre
-         address          | month | year 
---------------------------+-------+------
- 4627 Star Rd.            | Sept  | 2017
- San Francisco, CA  94107           
- 113 Moon St.             | Jan   | 2018
- San Diego, CA  92093               
- 51 Belt Ct.              | Dec   | 2016
- Denver, CO  90123                  
- 93114 Radial Rd.         | Jul   | 2017
- Chicago, IL  60605                 
- 7301 Brookview Ave.      | Dec   | 2018
- Columbus, OH  43213                
-(5 rows)
-```
+1. Use the `HdfsTextMulti` profile to create a queryable external table from the `pxf_hdfs_multi.txt` file:
+
+    ``` sql
+    gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textmulti(address text, month text, year int)
+                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_multi.txt?PROFILE=HdfsTextMulti')
+              FORMAT 'CSV' (delimiter=E':');
+    ```
+    
+2. Query the `pxf_hdfs_textmulti` table:
+
+    ``` sql
+    gpadmin=# SELECT * FROM pxf_hdfs_textmulti;
+    ```
+
+    ``` pre
+             address          | month | year 
+    --------------------------+-------+------
+     4627 Star Rd.            | Sept  | 2017
+     San Francisco, CA  94107           
+     113 Moon St.             | Jan   | 2018
+     San Diego, CA  92093               
+     51 Belt Ct.              | Dec   | 2016
+     Denver, CO  90123                  
+     93114 Radial Rd.         | Jul   | 2017
+     Chicago, IL  60605                 
+     7301 Brookview Ave.      | Dec   | 2018
+     Columbus, OH  43213                
+    (5 rows)
+    ```
 
 ## <a id="profile_hdfsavro"></a>Avro Profile
 
@@ -252,7 +264,7 @@ The `Avro` profile supports the following \<custom-options\>:
 | RECORDKEY_DELIM | The delimiter character(s) to place between the field name and value of a record entry when PXF maps an Avro complex data type to a text column. The default is a colon `:` character. |
 
 
-### <a id="topic_tr3_dpg_ts__section_m2p_ztg_ts"></a>Avro Schemas
+### <a id="topic_tr3_dpg_ts__section_m2p_ztg_ts"></a>Avro Schemas and Data
 
 Avro schemas are defined using JSON, and composed of the same primitive and complex types identified in the data mapping section above. Avro schema files typically have a `.avsc` suffix.
 
@@ -267,133 +279,148 @@ The examples in this section will be operating on Avro data fields with the foll
 - address - record comprised of street number (int), street name (string), and city (string)
 - relationship - enumerated type
 
-Create an Avro schema file to represent the above schema:
+#### <a id="topic_tr3_dpg_ts__section_m2p_ztg_ts_99"></a>Create Sample Schema
 
-``` shell
-$ vi /tmp/avro_schema.avsc
-```
+Perform the following operations to create an Avro schema to represent the example schema described above.
 
-Copy and paste the following text into `avro_schema.avsc`:
-
-``` json
-{
-"type" : "record",
-  "name" : "example_schema",
-  "namespace" : "com.example",
-  "fields" : [ {
-    "name" : "id",
-    "type" : "long",
-    "doc" : "Id of the user account"
-  }, {
-    "name" : "username",
-    "type" : "string",
-    "doc" : "Name of the user account"
-  }, {
-    "name" : "followers",
-    "type" : {"type": "array", "items": "string"},
-    "doc" : "Users followers"
-  }, {
-    "name": "fmap",
-    "type": {"type": "map", "values": "long"}
-  }, {
-    "name": "relationship",
-    "type": {
-        "type": "enum",
-        "name": "relationshipEnum",
-        "symbols": ["MARRIED","LOVE","FRIEND","COLLEAGUE","STRANGER","ENEMY"]
-    }
-  }, {
-    "name": "address",
-    "type": {
-        "type": "record",
-        "name": "addressRecord",
-        "fields": [
-            {"name":"number", "type":"int"},
-            {"name":"street", "type":"string"},
-            {"name":"city", "type":"string"}]
+1. Create a file named `avro_schema.avsc`:
+
+    ``` shell
+    $ vi /tmp/avro_schema.avsc
+    ```
+
+2. Copy and paste the following text into `avro_schema.avsc`:
+
+    ``` json
+    {
+    "type" : "record",
+      "name" : "example_schema",
+      "namespace" : "com.example",
+      "fields" : [ {
+        "name" : "id",
+        "type" : "long",
+        "doc" : "Id of the user account"
+      }, {
+        "name" : "username",
+        "type" : "string",
+        "doc" : "Name of the user account"
+      }, {
+        "name" : "followers",
+        "type" : {"type": "array", "items": "string"},
+        "doc" : "Users followers"
+      }, {
+        "name": "fmap",
+        "type": {"type": "map", "values": "long"}
+      }, {
+        "name": "relationship",
+        "type": {
+            "type": "enum",
+            "name": "relationshipEnum",
+            "symbols": ["MARRIED","LOVE","FRIEND","COLLEAGUE","STRANGER","ENEMY"]
+        }
+      }, {
+        "name": "address",
+        "type": {
+            "type": "record",
+            "name": "addressRecord",
+            "fields": [
+                {"name":"number", "type":"int"},
+                {"name":"street", "type":"string"},
+                {"name":"city", "type":"string"}]
+        }
+      } ],
+      "doc:" : "A basic schema for storing messages"
     }
-  } ],
-  "doc:" : "A basic schema for storing messages"
-}
-```
+    ```
 
-### <a id="topic_tr3_dpg_ts__section_spk_15g_ts"></a>Sample Avro Data (JSON)
+#### <a id="topic_tr3_dpg_ts__section_spk_15g_ts"></a>Create Sample Avro Data File (JSON)
 
-Create a text file named `pxf_hdfs_avro.txt`:
+Perform the following steps to create a sample Avro data file conforming to the above schema.
 
-``` shell
-$ vi /tmp/pxf_hdfs_avro.txt
-```
+1.  Create a text file named `pxf_hdfs_avro.txt`:
+
+    ``` shell
+    $ vi /tmp/pxf_hdfs_avro.txt
+    ```
 
-Enter the following data into `pxf_hdfs_avro.txt`:
+2. Enter the following data into `pxf_hdfs_avro.txt`:
 
-``` pre
-{"id":1, "username":"john","followers":["kate", "santosh"], "relationship": "FRIEND", "fmap": {"kate":10,"santosh":4}, "address":{"number":1, "street":"renaissance drive", "city":"san jose"}}
+    ``` pre
+    {"id":1, "username":"john","followers":["kate", "santosh"], "relationship": "FRIEND", "fmap": {"kate":10,"santosh":4}, "address":{"number":1, "street":"renaissance drive", "city":"san jose"}}
     
-{"id":2, "username":"jim","followers":["john", "pam"], "relationship": "COLLEAGUE", "fmap": {"john":3,"pam":3}, "address":{"number":9, "street":"deer creek", "city":"palo alto"}}
-```
+    {"id":2, "username":"jim","followers":["john", "pam"], "relationship": "COLLEAGUE", "fmap": {"john":3,"pam":3}, "address":{"number":9, "street":"deer creek", "city":"palo alto"}}
+    ```
 
-The sample data uses a comma `,` to separate top level records and a colon `:` to separate map/key values and record field name/values.
+    The sample data uses a comma `,` to separate top level records and a colon `:` to separate map/key values and record field name/values.
 
-Convert the text file to Avro format. There are various ways to perform the conversion programmatically and via the command line. In this example, we use the [Java Avro tools](http://avro.apache.org/releases.html), and the jar file resides in the current directory:
+3. Convert the text file to Avro format. There are various ways to perform the conversion programmatically and via the command line. In this example, we use the [Java Avro tools](http://avro.apache.org/releases.html), and the jar file resides in the current directory:
 
-``` shell
-$ java -jar ./avro-tools-1.8.1.jar fromjson --schema-file /tmp/avro_schema.avsc /tmp/pxf_hdfs_avro.txt > /tmp/pxf_hdfs_avro.avro
-```
+    ``` shell
+    $ java -jar ./avro-tools-1.8.1.jar fromjson --schema-file /tmp/avro_schema.avsc /tmp/pxf_hdfs_avro.txt > /tmp/pxf_hdfs_avro.avro
+    ```
 
-The generated Avro binary data file is written to `/tmp/pxf_hdfs_avro.avro`. Copy this file to HDFS:
+    The generated Avro binary data file is written to `/tmp/pxf_hdfs_avro.avro`. 
+    
+4. Copy the generated file to HDFS:
 
-``` shell
-$ sudo -u hdfs hdfs dfs -put /tmp/pxf_hdfs_avro.avro /data/pxf_examples/
-```
-### <a id="topic_avro_querydata"></a>Querying Avro Data
+    ``` shell
+    $ sudo -u hdfs hdfs dfs -put /tmp/pxf_hdfs_avro.avro /data/pxf_examples/
+    ```
+    
+### <a id="topic_avro_querydata"></a>Querying With Avro Profile
 
-Create a queryable external table from this Avro file:
+Perform the following steps to create and query an external table accessing the `pxf_hdfs_avro.avro` file you added to HDFS in the previous section. When creating the table:
 
 -  Map the top-level primitive fields, `id` (type long) and `username` (type string), to their equivalent HAWQ types (bigint and text). 
 -  Map the remaining complex fields to type text.
--  Explicitly set the record, map, and collection delimiters using the Avro profile custom options:
+-  Explicitly set the record, map, and collection delimiters using the Avro profile custom options.
 
-``` sql
-gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_avro(id bigint, username text, followers text, fmap text, relationship text, address text)
-            LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_avro.avro?PROFILE=Avro&COLLECTION_DELIM=,&MAPKEY_DELIM=:&RECORDKEY_DELIM=:')
-          FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
-```
 
-A simple query of the external table shows the components of the complex type data separated with delimiters:
+1. Use the `Avro` profile to create a queryable external table from the `pxf_hdfs_avro.avro` file:
 
-``` sql
-gpadmin=# SELECT * FROM pxf_hdfs_avro;
-```
+    ``` sql
+    gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_avro(id bigint, username text, followers text, fmap text, relationship text, address text)
+                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_avro.avro?PROFILE=Avro&COLLECTION_DELIM=,&MAPKEY_DELIM=:&RECORDKEY_DELIM=:')
+              FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
+    ```
 
-``` pre
- id | username |   followers    |        fmap         | relationship |                      address                      
-----+----------+----------------+--------------------+--------------+---------------------------------------------------
-  1 | john     | [kate,santosh] | {kate:10,santosh:4} | FRIEND       | {number:1,street:renaissance drive,city:san jose}
-  2 | jim      | [john,pam]     | {pam:3,john:3}      | COLLEAGUE    | {number:9,street:deer creek,city:palo alto}
-(2 rows)
-```
+2. Perform a simple query of the `pxf_hdfs_avro` table:
 
-Process the delimited components in the text columns as necessary for your application. For example, the following command uses the HAWQ internal `string_to_array` function to convert entries in the `followers` field to a text array column in a new view.
+    ``` sql
+    gpadmin=# SELECT * FROM pxf_hdfs_avro;
+    ```
 
-``` sql
-gpadmin=# CREATE VIEW followers_view AS 
+    ``` pre
+     id | username |   followers    |        fmap         | relationship |                      address                      
+    ----+----------+----------------+--------------------+--------------+---------------------------------------------------
+      1 | john     | [kate,santosh] | {kate:10,santosh:4} | FRIEND       | {number:1,street:renaissance drive,city:san jose}
+      2 | jim      | [john,pam]     | {pam:3,john:3}      | COLLEAGUE    | {number:9,street:deer creek,city:palo alto}
+    (2 rows)
+    ```
+
+    The simple query of the external table shows the components of the complex type data separated with delimiters.
+
+
+3. Process the delimited components in the text columns as necessary for your application. For example, the following command uses the HAWQ internal `string_to_array` function to convert entries in the `followers` field to a text array column in a new view.
+
+    ``` sql
+    gpadmin=# CREATE VIEW followers_view AS 
   SELECT username, address, string_to_array(substring(followers FROM 2 FOR (char_length(followers) - 2)), ',')::text[] 
-    AS followers 
-  FROM pxf_hdfs_avro;
-```
+        AS followers 
+      FROM pxf_hdfs_avro;
+    ```
 
-Query the view to filter rows based on whether a particular follower appears in the array:
+4. Query the view to filter rows based on whether a particular follower appears in the array:
 
-```
-gpadmin=# SELECT username, address FROM followers_view WHERE followers @> '{john}';
-```
+    ``` sql
+    gpadmin=# SELECT username, address FROM followers_view WHERE followers @> '{john}';
+    ```
 
-``` pre
- username |                   address                   
-----------+---------------------------------------------
- jim      | {number:9,street:deer creek,city:palo alto}
-```
+    ``` pre
+     username |                   address                   
+    ----------+---------------------------------------------
+     jim      | {number:9,street:deer creek,city:palo alto}
+    ```
 
 ## <a id="accessdataonahavhdfscluster"></a>Accessing HDFS Data in a High Availability HDFS Cluster