You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hawq.apache.org by yo...@apache.org on 2016/10/26 18:31:06 UTC

[06/14] incubator-hawq-docs git commit: more mods to HDFS plugin docs

more mods to HDFS plugin docs


Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/86d13b31
Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/86d13b31
Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/86d13b31

Branch: refs/heads/develop
Commit: 86d13b312ea8591949b8a811973937ab60f74df9
Parents: 6ba64f9
Author: Lisa Owen <lo...@pivotal.io>
Authored: Thu Oct 20 15:36:01 2016 -0700
Committer: Lisa Owen <lo...@pivotal.io>
Committed: Thu Oct 20 15:36:01 2016 -0700

----------------------------------------------------------------------
 pxf/HDFSFileDataPXF.html.md.erb | 34 +++++++++++++++++++---------------
 1 file changed, 19 insertions(+), 15 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/86d13b31/pxf/HDFSFileDataPXF.html.md.erb
----------------------------------------------------------------------
diff --git a/pxf/HDFSFileDataPXF.html.md.erb b/pxf/HDFSFileDataPXF.html.md.erb
index fb8f463..213313f 100644
--- a/pxf/HDFSFileDataPXF.html.md.erb
+++ b/pxf/HDFSFileDataPXF.html.md.erb
@@ -36,9 +36,9 @@ The HDFS file system command is `hdfs dfs <options> [<file>]`. Invoked with no o
 
 | Option  | Description |
 |-------|-------------------------------------|
-| `-cat`    | Display file contents |
-| `-mkdir`    | Create directory in HDFS |
-| `-put`    | Copy file from local file system to HDFS |
+| `-cat`    | Display file contents. |
+| `-mkdir`    | Create directory in HDFS. |
+| `-put`    | Copy file from local file system to HDFS. |
 
 Create an HDFS directory for PXF example data files:
 
@@ -69,7 +69,7 @@ Add the data file to HDFS:
 $ sudo -u hdfs hdfs dfs -put /tmp/pxf_hdfs_simple.txt /data/pxf_examples/
 ```
 
-Display the contents of `pxf_hdfs_simple.txt` stored in HDFS:
+Display the contents of the `pxf_hdfs_simple.txt` file stored in HDFS:
 
 ``` shell
 $ sudo -u hdfs hdfs dfs -cat /data/pxf_examples/pxf_hdfs_simple.txt
@@ -129,7 +129,7 @@ HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](..
 | \<custom-option\>  | \<custom-option\> is profile-specific. Profile-specific options are discussed in the relevant profile topic later in this section.|
 | FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile when \<path-to-hdfs-file\> references a plain text delimited file.  |
 | FORMAT 'CSV' | Use '`CSV`' `FORMAT` with `HdfsTextSimple` and `HdfsTextMulti` profiles when \<path-to-hdfs-file\> references a comma-separated value file.  |
-| FORMAT 'CUSTOM' | Use the`CUSTOM` `FORMAT` with  the `Avro` profiles. The `Avro` '`CUSTOM`' `FORMAT` supports only the built-in `(formatter='pxfwritable_import')` \<formatting-property\> |
+| FORMAT 'CUSTOM' | Use the`CUSTOM` `FORMAT` with  the `Avro` profile. The `Avro` '`CUSTOM`' `FORMAT` supports only the built-in `(formatter='pxfwritable_import')` \<formatting-property\> |
  \<formatting-properties\>    | \<formatting-properties\> are profile-specific. Profile-specific formatting options are discussed in the relevant profile topic later in this section. |
 
 *Note*: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification.
@@ -214,7 +214,7 @@ gpadmin=# SELECT * FROM pxf_hdfs_textmulti;
 
 Apache Avro is a data serialization framework where the data is serialized in a compact binary format. 
 
-Avro specifies data types be defined in JSON. Avro format files have an independent schema, also defined in JSON. In Avro files, the schema is stored with the data. 
+Avro specifies data types be defined in JSON. Avro format files have an independent schema, also defined in JSON. In Avro files, the schema is stored with the data. An Avro schema, together with its data, is fully self-describing.
 
 ### <a id="profile_hdfsavrodatamap"></a>Data Type Mapping
 
@@ -239,7 +239,7 @@ The following table summarizes external mapping rules for Avro data.
 
 For complex types, the PXF `Avro` profile inserts default delimiters between collection items and values. You can use non-default delimiter characters by identifying values for specific `Avro` custom options in the `CREATE EXTERNAL TABLE` call. 
 
-The Avro profile supports the following \<custom-options\>:
+The `Avro` profile supports the following \<custom-options\>:
 
 | Option Name   | Description       
 |---------------|--------------------|                                                                                        
@@ -250,11 +250,11 @@ The Avro profile supports the following \<custom-options\>:
 
 ### <a id="topic_tr3_dpg_ts__section_m2p_ztg_ts"></a>Avro Schemas
 
-Avro schemas are defined using JSON, and composed of the same primitive and complex types identified in the data mapping section above. Avro schema files typically have a `.avsc` suffix. An Avro schema, together with its data, is fully self-describing.
+Avro schemas are defined using JSON, and composed of the same primitive and complex types identified in the data mapping section above. Avro schema files typically have a `.avsc` suffix.
 
-Fields in an Avro schema file are defined via an array of objects, each of which is specified by a name and a type. The field type is another schema object
+Fields in an Avro schema file are defined via an array of objects, each of which is specified by a name and a type.
 
-The examples in this section will be operating on Avro data with the following record schema:
+The examples in this section will be operating on Avro data fields with the following record schema:
 
 - id - long
 - username - string
@@ -331,7 +331,7 @@ Enter the following data into `pxf_hdfs_avro.txt`:
 
 The sample data uses a comma `,` to separate top level records and a colon `:` to separate map/key values and record field name/values.
 
-Convert the text file to Avro format. There are various ways to perform the conversion programmatically and via the command line. In this example, we use the [Java Avro tools](http://avro.apache.org/releases.html); the jar file resides in the current directory:
+Convert the text file to Avro format. There are various ways to perform the conversion programmatically and via the command line. In this example, we use the [Java Avro tools](http://avro.apache.org/releases.html), and the jar file resides in the current directory:
 
 ``` shell
 $ java -jar ./avro-tools-1.8.1.jar fromjson --schema-file /tmp/avro_schema.avsc /tmp/pxf_hdfs_avro.txt > /tmp/pxf_hdfs_avro.avro
@@ -370,15 +370,19 @@ gpadmin=# SELECT * FROM pxf_hdfs_avro;
 (2 rows)
 ```
 
-Process the delimited components in the text columns as necessary for your application. For example, the following command uses the `string_to_array` function to convert entries in the `followers` field to a text array column in a new view. The view is then queried to filter rows based on whether a particular follower appears in the array:
+Process the delimited components in the text columns as necessary for your application. For example, the following command uses the HAWQ internal `string_to_array` function to convert entries in the `followers` field to a text array column in a new view.
 
 ``` sql
 gpadmin=# CREATE VIEW followers_view AS 
   SELECT username, address, string_to_array(substring(followers FROM 2 FOR (char_length(followers) - 2)), ',')::text[] 
     AS followers 
   FROM pxf_hdfs_avro;
+```
+
+Query the view to filter rows based on whether a particular follower appears in the array:
 
-gpadmin=# SELECT username, address FROM followers_view WHERE followers @> '{john}'
+```
+gpadmin=# SELECT username, address FROM followers_view WHERE followers @> '{john}';
 ```
 
 ``` pre
@@ -389,7 +393,7 @@ gpadmin=# SELECT username, address FROM followers_view WHERE followers @> '{john
 
 ## <a id="accessdataonahavhdfscluster"></a>Accessing HDFS Data in a High Availability HDFS Cluster
 
-To�access external HDFS data in a High Availability HDFS cluster, change the URI LOCATION clause to use \<HA-nameservice\> rather than  \<host\>[:\<port\>].
+To�access external HDFS data in a High Availability HDFS cluster, change the `CREATE EXTERNAL TABLE` `LOCATION` clause to use \<HA-nameservice\> rather than  \<host\>[:\<port\>].
 
 ``` sql
 gpadmin=# CREATE EXTERNAL TABLE <table_name> ( <column_name> <data_type> [, ...] | LIKE <other_table> )
@@ -397,7 +401,7 @@ gpadmin=# CREATE EXTERNAL TABLE <table_name> ( <column_name> <data_type> [, ...]
          FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
 ```
 
-The opposite is true when a highly available HDFS cluster is reverted to a single NameNode configuration. In that case, any table definition that has the \<HA-nameservice\> specified should use the \<host\>[:\<port\>] syntax.�
+The opposite is true when a highly available HDFS cluster is reverted to a single NameNode configuration. In that case, any table definition that has specified \<HA-nameservice\> should use the \<host\>[:\<port\>] syntax.�
 
 
 ## <a id="hdfs_advanced"></a>Advanced