You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hawq.apache.org by yo...@apache.org on 2016/10/26 18:31:05 UTC

[05/14] incubator-hawq-docs git commit: use more descriptive file names

use more descriptive file names


Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/6ba64f94
Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/6ba64f94
Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/6ba64f94

Branch: refs/heads/develop
Commit: 6ba64f94d5b11397c98f46eb14d5c6e48d17a6cc
Parents: fd029d5
Author: Lisa Owen <lo...@pivotal.io>
Authored: Thu Oct 20 14:12:43 2016 -0700
Committer: Lisa Owen <lo...@pivotal.io>
Committed: Thu Oct 20 14:12:43 2016 -0700

----------------------------------------------------------------------
 pxf/HDFSFileDataPXF.html.md.erb | 43 +++++++++++++++++-------------------
 1 file changed, 20 insertions(+), 23 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/6ba64f94/pxf/HDFSFileDataPXF.html.md.erb
----------------------------------------------------------------------
diff --git a/pxf/HDFSFileDataPXF.html.md.erb b/pxf/HDFSFileDataPXF.html.md.erb
index 9914ca9..fb8f463 100644
--- a/pxf/HDFSFileDataPXF.html.md.erb
+++ b/pxf/HDFSFileDataPXF.html.md.erb
@@ -2,7 +2,7 @@
 title: Accessing HDFS File Data
 ---
 
-HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS.  The plug-in supports plain delimited and comma-separated-value text files.  The HDFS plug-in also supports the Avro binary format.
+HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS.  The plug-in supports plain delimited and comma-separated-value format text files.  The HDFS plug-in also supports the Avro binary format.
 
 This section describes how to use PXF to access HDFS data, including how to create and query an external table from files in the HDFS data store.
 
@@ -17,7 +17,7 @@ Before working with HDFS file data using HAWQ and PXF, ensure that:
 
 The PXF HDFS plug-in supports reading the following file formats:
 
-- TextFile - comma-separated value (.csv) or delimited format plain text file
+- Text File - comma-separated value (.csv) or delimited format plain text file
 - Avro - JSON-defined, schema-based data serialization format
 
 The PXF HDFS plug-in includes the following profiles to support the file formats listed above:
@@ -49,10 +49,10 @@ $ sudo -u hdfs hdfs dfs -mkdir -p /data/pxf_examples
 Create a delimited plain text file:
 
 ``` shell
-$ vi /tmp/pxf_hdfs_ts.txt
+$ vi /tmp/pxf_hdfs_simple.txt
 ```
 
-Copy and paste the following data into `pxf_hdfs_ts.txt`:
+Copy and paste the following data into `pxf_hdfs_simple.txt`:
 
 ``` pre
 Prague,Jan,101,4875.33
@@ -66,22 +66,22 @@ Notice the use of the comma `,` to separate the four data fields.
 Add the data file to HDFS:
 
 ``` shell
-$ sudo -u hdfs hdfs dfs -put /tmp/pxf_hdfs_ts.txt /data/pxf_examples/
+$ sudo -u hdfs hdfs dfs -put /tmp/pxf_hdfs_simple.txt /data/pxf_examples/
 ```
 
-Display the contents of `pxf_hdfs_ts.txt` stored in HDFS:
+Display the contents of `pxf_hdfs_simple.txt` stored in HDFS:
 
 ``` shell
-$ sudo -u hdfs hdfs dfs -cat /data/pxf_examples/pxf_hdfs_ts.txt
+$ sudo -u hdfs hdfs dfs -cat /data/pxf_examples/pxf_hdfs_simple.txt
 ```
 
 Create a second delimited plain text file:
 
 ``` shell
-$ vi /tmp/pxf_hdfs_tm.txt
+$ vi /tmp/pxf_hdfs_multi.txt
 ```
 
-Copy/paste the following data into `pxf_hdfs_tm.txt`:
+Copy/paste the following data into `pxf_hdfs_multi.txt`:
 
 ``` pre
 "4627 Star Rd.
@@ -101,13 +101,13 @@ Notice the use of the colon `:` to separate the three fields. Also notice the qu
 Add the data file to HDFS:
 
 ``` shell
-$ sudo -u hdfs hdfs dfs -put /tmp/pxf_hdfs_tm.txt /data/pxf_examples/
+$ sudo -u hdfs hdfs dfs -put /tmp/pxf_hdfs_multi.txt /data/pxf_examples/
 ```
 
 You will use these HDFS files in later sections.
 
 ## <a id="hdfsplugin_queryextdata"></a>Querying External HDFS Data
-The PXF HDFS plug-in supports several profiles. These include `HdfsTextSimple`, `HdfsTextMulti`, and `Avro`.
+The PXF HDFS plug-in supports the `HdfsTextSimple`, `HdfsTextMulti`, and `Avro` profiles.
 
 Use the following syntax to create a HAWQ external table representing HDFS data:�
 
@@ -145,11 +145,11 @@ Use the `HdfsTextSimple` profile when reading plain text delimited or .csv files
 | delimiter    | The delimiter character in the file. Default value is a comma `,`.|
 
 
-The following SQL call uses the PXF `HdfsTextSimple` profile to create a queryable HAWQ external table from the `pxf_hdfs_ts.txt` file you created and added to HDFS in an earlier section:
+The following SQL call uses the PXF `HdfsTextSimple` profile to create a queryable HAWQ external table from the `pxf_hdfs_simple.txt` file you created and added to HDFS in an earlier section:
 
 ``` sql
 gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple(location text, month text, num_orders int, total_sales float8)
-            LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_ts.txt?PROFILE=HdfsTextSimple') 
+            LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple')
           FORMAT 'TEXT' (delimiter=E',');
 gpadmin=# SELECT * FROM pxf_hdfs_textsimple;          
 ```
@@ -164,11 +164,11 @@ gpadmin=# SELECT * FROM pxf_hdfs_textsimple;
 (4 rows)
 ```
 
-Create a second external table from `pxf_hdfs_ts.txt`, this time using the `CSV` `FORMAT`:
+Create a second external table from `pxf_hdfs_simple.txt`, this time using the `CSV` `FORMAT`:
 
 ``` sql
 gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple_csv(location text, month text, num_orders int, total_sales float8)
-            LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_ts.txt?PROFILE=HdfsTextSimple') 
+            LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple')
           FORMAT 'CSV';
 gpadmin=# SELECT * FROM pxf_hdfs_textsimple_csv;          
 ```
@@ -185,11 +185,11 @@ Use the `HdfsTextMulti` profile when reading plain text files with delimited sin
 |-------|-------------------------------------|
 | delimiter    | The delimiter character in the file. |
 
-The following SQL call uses the PXF `HdfsTextMulti` profile to create a queryable HAWQ external table from the `pxf_hdfs_tm.txt` file you created and added to HDFS in an earlier section:
+The following SQL call uses the PXF `HdfsTextMulti` profile to create a queryable HAWQ external table from the `pxf_hdfs_multi.txt` file you created and added to HDFS in an earlier section:
 
 ``` sql
 gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textmulti(address text, month text, year int)
-            LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_tm.txt?PROFILE=HdfsTextMulti') 
+            LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_multi.txt?PROFILE=HdfsTextMulti')
           FORMAT 'CSV' (delimiter=E':');
 gpadmin=# SELECT * FROM pxf_hdfs_textmulti;
 ```
@@ -246,12 +246,11 @@ The Avro profile supports the following \<custom-options\>:
 | COLLECTION_DELIM | The delimiter character(s) to place between entries in a top-level array, map, or record field when PXF maps an Avro complex data type to a text column. The default is a comma `,` character. |
 | MAPKEY_DELIM | The delimiter character(s) to place between the key and value of a map entry when PXF maps an Avro complex data type to a text column. The default is a colon `:` character. |
 | RECORDKEY_DELIM | The delimiter character(s) to place between the field name and value of a record entry when PXF maps an Avro complex data type to a text column. The default is a colon `:` character. |
-| SCHEMA-DATA | The data schema file used to create and read�the HDFS file. This option has no default value. |
-| THREAD-SAFE | Boolean value determining if a table query can run in multi-thread mode. Default value is `TRUE` - requests can run in multi-thread mode. When set to `FALSE`, requests will be handled in a single thread. |
+
 
 ### <a id="topic_tr3_dpg_ts__section_m2p_ztg_ts"></a>Avro Schemas
 
-Avro schemas are defined using JSON, and composed of the same primitive and complex types identified in the data mapping section above. Avro schema files typically have a `.avsc` suffix.
+Avro schemas are defined using JSON, and composed of the same primitive and complex types identified in the data mapping section above. Avro schema files typically have a `.avsc` suffix. An Avro schema, together with its data, is fully self-describing.
 
 Fields in an Avro schema file are defined via an array of objects, each of which is specified by a name and a type. The field type is another schema object
 
@@ -314,8 +313,6 @@ Copy and paste the following text into `avro_schema.avsc`:
 }
 ```
 
-An Avro schema, together with its data, is fully self-describing.  
-
 ### <a id="topic_tr3_dpg_ts__section_spk_15g_ts"></a>Sample Avro Data (JSON)
 
 Create a text file named `pxf_hdfs_avro.txt`:
@@ -404,4 +401,4 @@ The opposite is true when a highly available HDFS cluster is reverted to a singl
 
 
 ## <a id="hdfs_advanced"></a>Advanced
-If you find that the pre-defined PXF HDFS profiles do not meet your needs, you may choose to create a custom HDFS profile from the existing HDFS serialization and deserialization classes. Refer to [Adding and Updating Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on creating a custom profile.
\ No newline at end of file
+If you find that the pre-defined PXF HDFS profiles do not meet your needs, you may choose to create a custom HDFS profile from the existing HDFS serialization and deserialization classes. Refer to [Adding and Updating Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on creating a custom profile.