You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hawq.apache.org by yo...@apache.org on 2016/10/27 23:32:55 UTC

[01/14] incubator-hawq-docs git commit: restructure PXF Hive pulug-in page; add more relevant examples

Repository: incubator-hawq-docs
Updated Branches:
  refs/heads/develop 01f3f8e9d -> bf5b6d0df


restructure PXF Hive pulug-in page; add more relevant examples


Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/0398a62f
Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/0398a62f
Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/0398a62f

Branch: refs/heads/develop
Commit: 0398a62fefd3627273927f938b4d082a25bf3003
Parents: 37857ea
Author: Lisa Owen <lo...@pivotal.io>
Authored: Mon Sep 26 14:37:04 2016 -0700
Committer: Lisa Owen <lo...@pivotal.io>
Committed: Mon Sep 26 14:37:04 2016 -0700

----------------------------------------------------------------------
 pxf/HivePXF.html.md.erb | 621 +++++++++++++++++++++++++++++--------------
 1 file changed, 419 insertions(+), 202 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/0398a62f/pxf/HivePXF.html.md.erb
----------------------------------------------------------------------
diff --git a/pxf/HivePXF.html.md.erb b/pxf/HivePXF.html.md.erb
index db3e53c..55a0aeb 100644
--- a/pxf/HivePXF.html.md.erb
+++ b/pxf/HivePXF.html.md.erb
@@ -2,121 +2,408 @@
 title: Accessing Hive Data
 ---
 
-This topic describes how to access Hive data using PXF. You have several options for querying data stored in Hive. You can create external tables in PXF and then query those tables, or you can easily query Hive tables by using HAWQ and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored in HCatalog.
+Apache Hive is a distributed data warehousing infrastructure.  Hive facilitates managing large data sets, supporting multiple data formats, including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive plug-in reads data stored in Hive, as well as HDFS or HBase. 
+
+This section describes how to use PXF to access Hive data. Options for querying data stored in Hive include:
+
+-  Creating an external table in PXF and querying that table
+-  Querying Hive tables via PXF's integration with HCatalog
 
 ## <a id="installingthepxfhiveplugin"></a>Prerequisites
 
-Check the following before using PXF to access Hive:
+Before accessing Hive data with HAWQ and PXF, ensure that:
 
 -   The PXF HDFS plug-in is installed on all cluster nodes.
 -   The PXF Hive plug-in is installed on all cluster nodes.
 -   The Hive JAR files and conf directory�are installed on all cluster nodes.
--   Test PXF on HDFS before connecting to Hive or HBase.
+-   You have tested PXF on HDFS.
 -   You are running the Hive Metastore service on a machine in your cluster.�
 -   You have set the `hive.metastore.uris`�property in the�`hive-site.xml` on the NameNode.
 
+## <a id="topic_p2s_lvl_25"></a>Hive File Formats
+
+Hive supports several file formats:
+
+-   TextFile - flat file with data in comma-, tab-, or space-separated value format or JSON notation
+-   SequenceFile - flat file consisting of binary key/value pairs
+-   RCFile - record columnar data consisting of binary key/value pairs; high row compression rate
+-   ORCFile - optimized row columnar data with stripe, footer, and postscript sections; reduces data size
+-   Parquet - compressed columnar data representation
+-   Avro - JSON-defined, schema-based data serialization format
+
+Refer to [File Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for detailed information about the file formats supported by Hive.
+
+## <a id="topic_p2s_lvl_25"></a>Data Type Mapping
+
+### <a id="hive_primdatatypes"></a>Primitive Data Types
+
+To represent Hive data in HAWQ, map data values that use a primitive data type to HAWQ columns of the same type.
+
+The following table summarizes external mapping rules for Hive primitive types.
+
+| Hive Data Type  | Hawq Data Type |
+|-------|---------------------------|
+| boolean    | bool |
+| int   | int4 |
+| smallint   | int2 |  
+| tinyint   | int2 | 
+| bigint   | int8 | 
+| decimal  |  numeric  |
+| float   | float4 | 
+| double   | float8 | 
+| string   | text | 
+| binary   | bytea |
+| char   | bpchar |
+| varchar   | varchar |
+| timestamp   | timestamp |
+| date   | date |
+
+
+### <a id="topic_b4v_g3n_25"></a>Complex Data Types
+
+Hive supports complex data types including array, struct, map, and union. PXF maps each of these complex types to `text`.  While HAWQ does not natively support these types, you can create HAWQ functions or application code to extract components of complex data types.
+
+A complex data type example is provided later in this section.
+
+
+## <a id="hive_sampledataset"></a>Sample Data Set
+
+Examples used in this section will operate on a common data set. This simple data set models a retail sales operation and includes fields with the following names and data types:
+
+- location - text
+- month - text
+- number\_of\_orders - integer
+- total\_sales - double
+
+Prepare the sample data set for use. First, create a text file:
+
+```
+$ vi /tmp/pxf_hive_datafile.txt
+```
+
+Add the following data to `pxf_hive_datafile.txt`, employing a comma `,` to separate the field values:
+
+```
+Prague,Jan,101,4875.33
+Rome,Mar,87,1557.39
+Bangalore,May,317,8936.99
+Beijing,Jul,411,11600.67
+San Francisco,Sept,156,6846.34
+Paris,Nov,159,7134.56
+San Francisco,Jan,113,5397.89
+Prague,Dec,333,9894.77
+Bangalore,Jul,271,8320.55
+Beijing,Dec,100,4248.41
+```
+Make note of the path to this file; you will use it in later exercises.
+
+
 ## <a id="hivecommandline"></a>Hive Command Line
 
-To start the Hive command line and work directly on a Hive table:
+The Hive command line is a subsystem similar to that of `psql`. To start the Hive command line:
 
 ``` shell
-$ hive
+$ HADOOP_USER_NAME=hdfs hive
 ```
 
-Here is an example�of how to create and load data into�a sample Hive table from an existing file.
+The default Hive database is named `default`. Create a Hive table named `sales_info` to expose our sample data set. Create this table in the `default` database: 
 
 ``` sql
-hive> CREATE TABLE test (name string, type string, supplier_key int, full_price double) row format delimited fields terminated by ',';
-hive> LOAD DATA local inpath '/local/path/data.txt' into table test;
+hive> CREATE TABLE sales_info (location string, month string,
+        number_of_orders int, total_sales double) 
+        ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
+        STORED AS textfile;
 ```
 
-## <a id="topic_p2s_lvl_25"></a>Using PXF Tables to Query Hive
+Notice:
 
-Hive tables are defined in a specific way in PXF, regardless of the underlying file storage format. The PXF Hive plug-ins automatically detect source tables in the following formats:
+- The `STORED AS textfile` subclause instructs Hive to create the table in TextFile (the default) format.  Hive textfile format supports comma-, tab-, and space-separated values, as well as data specified in JSON notation.
+- The `DELIMITED FIELDS TERMINATED BY` subclause identifies the field delimiter within a data record (line). The `sales_info` table field delimiter is a comma (`,`).
 
--   Text based
--   SequenceFile
--   RCFile
--   ORCFile
--   Parquet
--   Avro
+Load the `pxf_hive_datafile.txt` sample data file into the newly-created `sales_info` table:
 
-The source table can also be a combination of these types. The PXF Hive plug-in�uses this information to query the data in runtime.
+``` sql
+hive> LOAD DATA local INPATH '/tmp/pxf_hive_datafile.txt' 
+        INTO TABLE sales_info;
+```
+
+Perform a query on `sales_info` to verify the data was loaded successfully:
 
--   **[Syntax Example](../pxf/HivePXF.html#syntax2)**
+``` sql
+hive> SELECT * FROM sales_info;
+```
 
--   **[Hive Complex Types](../pxf/HivePXF.html#topic_b4v_g3n_25)**
+In examples later in this section, you will access the `sales_info` Hive table directly via PXF. You will also insert `sales_info` data into tables of other Hive file format types, and use PXF to access those directly as well.
 
-### <a id="syntax2"></a>Syntax Example
+## <a id="topic_p2s_lvl_25"></a>Querying External Hive Data
 
-The following�PXF table definition is valid for any Hive file storage type.
+Use the following syntax to create a HAWQ external table representing Hive data:
 
 ``` sql
-CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name 
+CREATE EXTERNAL TABLE table_name 
     ( column_name data_type [, ...] | LIKE other_table )
-LOCATION ('pxf://namenode[:port]/hive-db-name.hive-table-name?<pxf parameters>[&custom-option=value...]')FORMAT 'CUSTOM' (formatter='pxfwritable_import')
+LOCATION ('pxf://host[:port]/hive-db-name.hive-table-name
+    ?PROFILE=Hive|HiveText|HiveRC[&DELIMITER=delim']) 
+FORMAT 'CUSTOM|TEXT' (formatter='pxfwritable_import' | delimiter='delim')
 ```
 
-where `<pxf parameters>` is:
+Hive-plug-in-specific keywords and values used in the `CREATE EXTERNAL TABLE` call are described below.
 
-``` pre
-   FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class]
- | PROFILE=profile-name
+| Keyword  | Value |
+|-------|-------------------------------------|
+| host    | The HDFS NameNode. |
+| hive-db-name    | Name of the Hive database. If omitted, defaults to the Hive database named `default`. |
+| hive-table-name    | Name of the Hive table. |
+| PROFILE    | The `PROFILE` keyword must specify one of the values `Hive`, `HiveText`, or `HiveRC`. |  
+| DELIMITER    | The `DELIMITER` clause is required for both the `HiveText` and `HiveRC` profiles and identifies the field delimiter used in the Hive data set.  `delim` must be a single ascii character or specified in hexadecimal representation. |
+| FORMAT (`Hive` profile)   | The `FORMAT` clause must specify `CUSTOM`. The `CUSTOM` format supports only the built-in `pxfwritable_import` `formatter`.   |
+| FORMAT (`HiveText` and `HiveRC` profiles) | The `FORMAT` clause must specify `TEXT`. The `delimiter` must be specified a second time in `delim`. |
+
+
+## <a id="profile_hive"></a>Example: Hive Profile 
+
+The `Hive` profile works with any Hive file format table.
+
+The following SQL call uses the PXF `Hive` profile to create a queryable HAWQ external table from the Hive `sales_info` textfile format table created earlier:
+
+``` shell
+$ psql -d postgres
+```
+
+``` sql
+postgres=# CREATE EXTERNAL TABLE salesinfo_hiveprofile(location text, month text, num_orders int, total_sales float8)
+             LOCATION ('pxf://namenode:51200/default.sales_info?PROFILE=Hive')
+             FORMAT 'custom' (formatter='pxfwritable_import');
 ```
 
+``` sql
+postgres=# select * from salesinfo_hiveprofile; 
+```
 
-If `hive-db-name` is omitted, pxf will default to the Hive `default` database.
+``` shell                     
+   location    | month | num_orders | total_sales 
+---------------+-------+------------+-------------
+ Prague        | Jan   |        101 |     4875.33
+ Rome          | Mar   |         87 |     1557.39
+ Bangalore     | May   |        317 |     8936.99
+ ...
 
-**Note:** The port is the connection port for the PXF service. If the port is omitted, PXF assumes that High Availability (HA) is enabled and connects to the HA name service port, 51200 by default. The HA name service port can be changed by setting the pxf\_service\_port configuration parameter.
+```
 
-PXF has three built-in profiles for Hive tables:
+## <a id="profile_hivetext"></a>Example: HiveText Profile
 
--   Hive
--   HiveRC
--   HiveText
+Use the `HiveText` profile to query text formats. The `HiveText` profile is more performant than the `Hive` profile. 
 
-The Hive profile works with any Hive storage type. 
-The following example creates a readable HAWQ external table representing a Hive table named `accessories` in the `inventory` Hive database using the PXF Hive profile:
+**Note**: When using the `HiveText` profile, you *must* specify a delimiter option in *both* the `LOCATION` and `FORMAT` clauses.
+
+Use the PXF `HiveText` profile to create a queryable HAWQ external table from the Hive `sales_info` textfile format table created earlier:
+
+``` sql
+postgres=# CREATE EXTERNAL TABLE salesinfo_hivetextprofile(location text, month text, num_orders int, total_sales float8)
+             LOCATION ('pxf://namenode:51200/default.sales_info?PROFILE=HiveText&DELIMITER=\x2c') 
+             FORMAT 'TEXT' (delimiter=E',');
+```
+
+(You can safely ignore the "nonstandard use of escape in a string literal" warning and related messages.)
+
+Notice:
+
+- The `LOCATION` subclause `DELIMITER` value is specified in hexadecimal format. `\x` is a prefix that instructs PXF to interpret the following characters as hexadecimal. `2c` is the hex value for the comma character.
+- The `FORMAT` subclause `delimiter` value is specified as the single ascii comma character ','. `E` ???
+
+Query the external table:
+
+``` sql
+postgres=# select * from salesinfo_hivetextprofile where location="Beijing"; 
+```
+
+``` shell                     
+ location | month | num_orders | total_sales 
+----------+-------+------------+-------------
+ Beijing  | Jul   |        411 |    11600.67
+ Beijing  | Dec   |        100 |     4248.41
+(2 rows)
+```
+
+## <a id="profile_hiverc"></a>Example: HiveRC Profile
+
+### <a id="profile_hiverc_rcfiletbl"></a>RCFile Format Hive Table
+
+The RCFile Hive format is used for row columnar formatted data. Create a Hive table with RCFile format:
 
 ``` shell
-$ psql -d postgres
+$ HADOOP_USER_NAME=hdfs hive
+```
+
+``` sql
+hive> CREATE TABLE sales_info_rcfile (location string, month string,
+        number_of_orders int, total_sales double) 
+        ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
+        STORED AS rcfile;
 ```
 
+Insert the data from the `sales_info` table into `sales_info_rcfile`:
+
 ``` sql
-postgres=# CREATE EXTERNAL TABLE hivetest(id int, newid int)
-LOCATION ('pxf://namenode:51200/inventory.accessories?PROFILE=Hive')
-FORMAT 'custom' (formatter='pxfwritable_import');
+hive> INSERT INTO TABLE sales_info_rcfile SELECT * FROM sales_info;
 ```
 
+A copy of the sample data set is now stored in RCFile format in `sales_info_rcfile`. Perform a Hive query on `sales_info_rcfile` to verify the data was loaded successfully:
 
-Use HiveRC and HiveText to query RC and Text formats respectively. The HiveRC and HiveText profiles are faster than the generic Hive profile. When using the HiveRC and HiveText profiles, you must specify a DELIMITER option in the LOCATION clause. See [Using Profiles to Read and Write Data](ReadWritePXF.html#readingandwritingdatawithpxf) for more information on profiles.
+``` sql
+hive> SELECT * FROM sales_info_rcfile;
+```
+### <a id="profile_hiverc_pxfquery"></a>PXF Query
 
+Use the `HiveRC` profile to query RCFile-formatted data in Hive tables. The `HiveRC` profile is more performant than the `Hive` profile. 
 
-### <a id="topic_b4v_g3n_25"></a>Hive Complex Types
+**Note**: When using the `HiveRC` profile, you *must* specify a delimiter option in *both* the `LOCATION` and `FORMAT` clauses.
 
-PXF tables support Hive data types that are not primitive types. The supported Hive complex data types are array, struct, map, and union. This Hive `CREATE TABLE` statement, for example, creates a table with each of these complex data types:
+Use the PXF `HiveRC` profile to create a queryable HAWQ external table from the Hive `sales_info_rcfile` table created in the previous section:
 
 ``` sql
-hive> CREATE TABLE sales_collections (
-  item STRING,
-  price FLOAT,
-  properties ARRAY<STRING>,
-  hash MAP<STRING,FLOAT>,
-  delivery_address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>,
-  others UNIONTYPE<FLOAT, BOOLEAN, STRING>
-)  
-ROW FORMAT DELIMITED FIELDS
-TERMINATED BY '\001' COLLECTION ITEMS TERMINATED BY '\002' MAP KEYS TERMINATED BY '\003' LINES TERMINATED BY '\n' STORED AS TEXTFILE;
-hive> LOAD DATA LOCAL INPATH '/local/path/<some data file>' INTO TABLE sales_collection;
+postgres=# CREATE EXTERNAL TABLE salesinfo_hivercprofile(location text, month text, num_orders int, total_sales float8)
+             LOCATION ('pxf://namenode:51200/default.sales_info_rcfile?PROFILE=HiveRC&DELIMITER=\x2c') 
+             FORMAT 'TEXT' (delimiter=E',');
+```
+
+(Again, you can safely ignore the "nonstandard use of escape in a string literal" warning and related messages.)
+
+Query the external table:
+
+``` sql
+postgres=# select location, total_sales from salesinfo_hivercprofile; 
+```
+
+``` shell                     
+   location    | total_sales 
+---------------+-------------
+ Prague        |     4875.33
+ Rome          |     1557.39
+ Bangalore     |     8936.99
+ Beijing       |    11600.67
+ ...
+```
+
+## <a id="profileperf"></a>Profile Performance Considerations
+
+The `HiveRC` and `HiveText` profiles are faster than the generic `Hive` profile.
+
+?? MORE HERE. ??
+
+## <a id="complex_dt_example"></a>Example: Complex Data Types
+
+Prepare a sample data set for use. This data set will employ the array and map complex types, specifically an array of integers and a key/value pair map. 
+
+The data set includes fields with the following names and data types:
+
+- index - int
+- name - string
+- intarray - array of integers
+- propmap - map of string keys and string value pairs
+
+When specifying an array field in a Hive table, you must identify the terminator for each item in the collection. Similarly, the map key termination character must also be specified.
+
+Create a text file from which you will load the data set:
+
+```
+$ vi /tmp/pxf_hive_complex.txt
+```
+
+Add the following data to `pxf_hive_complex.txt`.  The data uses a comma `,` to separate field values, the percent symbol `%` to separate collection items, and a `:` to terminate map key values:
+
+```
+3,Prague,1%2%3,zone:euro%status:up
+89,Rome,4%5%6,zone:euro
+400,Bangalore,7%8%9,zone:apac%status:pending
+183,Beijing,0%1%2,zone:apac
+94,Sacramento,3%4%5,zone:noam%status:down
+101,Paris,6%7%8,zone:euro%status:up
+56,Frankfurt,9%0%1,zone:euro
+202,Jakarta,2%3%4,zone:apac%status:up
+313,Sydney,5%6%7,zone:apac%status:pending
+76,Atlanta,8%9%0,zone:noam%status:down
+```
+
+Create a Hive table to represent this data:
+
+``` shell
+$ HADOOP_USER_NAME=hdfs hive
+```
+
+``` sql
+CREATE TABLE table_complextypes( index int, name string, intarray ARRAY<INT>, propmap MAP<string, string>) 
+  ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
+  COLLECTION ITEMS TERMINATED BY '%' 
+  MAP KEYS TERMINATED BY ':' 
+  STORED AS TEXTFILE;
+```
+
+Notice:
+
+- The `FIELDS TERMINATED BY` identifies a comma as the field terminator.
+- The `COLLECTION ITEMS TERMINATED BY` specifies the percent sign as the collection items (array item, map key/value pair) terminator.
+- The `MAP KEYS TERMINATED BY` ientifies a colon as the terminator for map keys.
+
+Load the `pxf_hive_complex.txt` sample data file into the newly-created `table_complextypes` table:
+
+``` sql
+hive> LOAD DATA local INPATH '/tmp/pxf_hive_complex.txt' INTO TABLE table_complextypes;
+```
+
+Perform a query on `table_complextypes` to verify the data was loaded successfully:
+
+``` sql
+hive> SELECT * FROM table_complextypes;
+```
+
+``` shell
+3	Prague	[1,2,3]	{"zone":"euro","status":"up"}
+89	Rome	[4,5,6]	{"zone":"euro"}
+400	Bangalore	[7,8,9]	{"zone":"apac","status":"pending"}
+...
+```
+
+Use the PXF `Hive` profile to create a queryable HAWQ external table representing the Hive table_complextypes:
+
+``` sql
+postgres=# CREATE EXTERNAL TABLE complextypes_hiveprofile(index int, name text, intarray text, propmap text)
+             LOCATION ('pxf://namenode:51200/table_complextypes?PROFILE=Hive') 
+             FORMAT 'CUSTOM' (formatter='pxfwritable_import');
+```
+
+Notice that the integer array and map complex types are mapped to PXF type text.
+
+Query the external table:
+
+``` sql
+select * from complextypes_hiveprofile; 
+```
+
+``` shell                      
+ index |    name    | intarray |              propmap               
+-------+------------+----------+------------------------------------
+     3 | Prague     | [1,2,3]  | {"zone":"euro","status":"up"}
+    89 | Rome       | [4,5,6]  | {"zone":"euro"}
+   400 | Bangalore  | [7,8,9]  | {"zone":"apac","status":"pending"}
+   183 | Beijing    | [0,1,2]  | {"zone":"apac"}
+    94 | Sacramento | [3,4,5]  | {"zone":"noam","status":"down"}
+   101 | Paris      | [6,7,8]  | {"zone":"euro","status":"up"}
+    56 | Frankfurt  | [9,0,1]  | {"zone":"euro"}
+   202 | Jakarta    | [2,3,4]  | {"zone":"apac","status":"up"}
+   313 | Sydney     | [5,6,7]  | {"zone":"apac","status":"pending"}
+    76 | Atlanta    | [8,9,0]  | {"zone":"noam","status":"down"}
+(10 rows)
 ```
 
-You can use HAWQ functions or application code to extract the components of the complex data columns as needed.
 
 ## <a id="hcatalog"></a>Using PXF and HCatalog to Query Hive
 
 Hive tables can be queried directly through HCatalog integration with HAWQ and PXF, regardless of the underlying file storage format.
 
-Previously, in order to query Hive tables using HAWQ and PXF, you needed to create an external table in PXF that described the target table's Hive metadata. Since HAWQ is now integrated with HCatalog, HAWQ can use metadata stored in HCatalog instead of external tables created for PXF. HCatalog is built on top of the Hive metastore and incorporates Hive's DDL. This provides several advantages:
+In previous sections, you created an external table in PXF that described the target table's Hive metadata. Another option for querying Hive tables is to take advantage of HAWQ's integration with HCatalog. This integration allows HAWQ to directly use table metadata stored in HCatalog. 
+
+HCatalog is built on top of the Hive metastore and incorporates Hive's DDL. This provides several advantages:
 
 -   You do not need to know the table schema of your Hive tables
 -   You do not need to manually enter information about Hive table location or format
@@ -130,16 +417,13 @@ The following diagram depicts how HAWQ integrates with HCatalog to query Hive ta
 2.  HAWQ creates in-memory catalog tables from the retrieved metadata. If a table is referenced multiple times in a transaction, HAWQ uses its in-memory metadata to reduce external calls to HCatalog.
 3.  PXF queries Hive using table metadata that is stored in the HAWQ in-memory catalog tables. Table metadata is dropped at the end of the transaction.
 
--   **[Usage](../pxf/HivePXF.html#topic_j1l_y55_c5)**
-
--   **[Limitations](../pxf/HivePXF.html#topic_r5k_pst_25)**
 
 ### <a id="topic_j1l_y55_c5"></a>Usage
 
 To enable HCatalog query integration in HAWQ, perform the following steps:
 
 1.  Make sure your deployment meets the requirements listed in [Prerequisites](#installingthepxfhiveplugin).
-2.  If necessary, set the `pxf_service_address` global configuration property to a hostname or IP address and port where you have installed the PXF Hive plug-in. By default, the value is set to `localhost:51200`.
+2.  If necessary, set the `pxf_service_address` global configuration property to the hostname or IP address and port where you have installed the PXF Hive plug-in. By default, the value is set to `localhost:51200`.
 
     ``` sql
     postgres=# SET pxf_service_address TO "hivenode:51200"
@@ -151,152 +435,85 @@ To enable HCatalog query integration in HAWQ, perform the following steps:
     postgres=# GRANT ALL ON PROTOCOL pxf TO "role";
     ``` 
     
-3.  To query a Hive table with HCatalog integration, simply query HCatalog directly from HAWQ. The query syntax is:
 
-    ``` sql
-    postgres=# SELECT * FROM hcatalog.hive-db-name.hive-table-name;
-    ```
+To query a Hive table with HCatalog integration, query HCatalog directly from HAWQ. The query syntax is:
+
+``` sql
+postgres=# SELECT * FROM hcatalog.hive-db-name.hive-table-name;
+```
+
+For example:
 
-    For example:
+``` sql
+postgres=# SELECT * FROM hcatalog.default.sales_info;
+```
 
+To obtain a description of a Hive table with HCatalog integration, you can use the `psql` client interface.
+
+-   Within HAWQ, use either the `\d                                         hcatalog.hive-db-name.hive-table-name` or `\d+                                         hcatalog.hive-db-name.hive-table-name` commands to describe a single table. For example, from the `psql` client interface:
+
+    ``` shell
+    $ psql -d postgres
+    ```
+    
     ``` sql
-    postgres=# SELECT * FROM hcatalog.default.sales;
+    postgres=# \d hcatalog.default.sales_info_rcfile;
+
+    PXF Hive Table "default.sales_info_rcfile"
+              Column      |  Type  
+    ------------------+--------
+     location         | text
+     month            | text
+     number_of_orders | int4
+     total_sales      | float8
     ```
+-   Use `\d hcatalog.hive-db-name.*` to describe the whole database schema, i.e. all tables in `hive-db-name`.
+-   Use `\d hcatalog.*.*` to describe the whole schema, i.e. all databases and tables.
 
-4.  To obtain a description of a Hive table with HCatalog integration, you can use the `psql` client interface.
-    -   Within HAWQ, use either the `\d                                         hcatalog.hive-db-name.hive-table-name` or `\d+                                         hcatalog.hive-db-name.hive-table-name` commands to describe a single table. For example, from the `psql` client interface:
-
-        ``` shell
-        $ psql -d postgres
-        postgres=# \d hcatalog.default.test
-
-        PXF Hive Table "default.test"
-            Column    |  Type  
-        --------------+--------
-         name         | text
-         type         | text
-         supplier_key | int4
-         full_price   | float8 
-        ```
-    -   Use `\d hcatalog.hive-db-name.*` to describe the whole database schema. For example:
-
-        ``` shell
-        postgres=# \d hcatalog.default.*
-
-        PXF Hive Table "default.test"
-            Column    |  Type  
-        --------------+--------
-         type         | text
-         name         | text
-         supplier_key | int4
-         full_price   | float8
-
-        PXF Hive Table "default.testabc"
-         Column | Type 
-        --------+------
-         type   | text
-         name   | text
-        ```
-    -   Use `\d hcatalog.*.*` to describe the whole schema:
-
-        ``` shell
-        postgres=# \d hcatalog.*.*
-
-        PXF Hive Table "default.test"
-            Column    |  Type  
-        --------------+--------
-         type         | text
-         name         | text
-         supplier_key | int4
-         full_price   | float8
-
-        PXF Hive Table "default.testabc"
-         Column | Type 
-        --------+------
-         type   | text
-         name   | text
-
-        PXF Hive Table "userdb.test"
-          Column  | Type 
-        ----------+------
-         address  | text
-         username | text
-         
-        ```
-
-    **Note:** When using `\d` or `\d+` commands in the `psql` HAWQ client, `hcatalog` will not be listed as a database. If you use other `psql` compatible clients, `hcatalog` will be listed as a database with a size value of `-1` since `hcatalog` is not a real database in HAWQ.
-
-5.  Alternatively, you can use the **pxf\_get\_item\_fields** user-defined function (UDF) to obtain Hive table descriptions from other client interfaces or third-party applications. The UDF takes a PXF profile and a table pattern string as its input parameters.
-
-    **Note:** Currently the only supported input profile is `'Hive'`.
-
-    For example, the following statement returns a description of a specific table. The description includes path, itemname (table), fieldname, and fieldtype.
+**Note:** When using `\d` or `\d+` commands in the `psql` HAWQ client, `hcatalog` will not be listed as a database. If you use other `psql` compatible clients, `hcatalog` will be listed as a database with a size value of `-1` since `hcatalog` is not a real database in HAWQ.
+
+Alternatively, you can use the `pxf\_get\_item\_fields` user-defined function (UDF) to obtain Hive table descriptions from other client interfaces or third-party applications. The UDF takes a PXF profile and a table pattern string as its input parameters.  **Note:** The only supported input profile at this time is `'Hive'`.
+
+- The following statement returns a description of a specific table. The description includes path, itemname (table), fieldname, and fieldtype.
 
     ``` sql
-    postgres=# select * from pxf_get_item_fields('Hive','default.test');
+    postgres=# select * from pxf_get_item_fields('Hive','default.sales_info_rcfile');
     ```
     
     ``` pre
-      path   | itemname |  fieldname   | fieldtype 
-    ---------+----------+--------------+-----------
-     default | test     | name         | text
-     default | test     | type         | text
-     default | test     | supplier_key | int4
-     default | test     | full_price   | float8
-    (4 rows)
+      path   |     itemname      |    fieldname     | fieldtype 
+    ---------+-------------------+------------------+-----------
+     default | sales_info_rcfile | location         | text
+     default | sales_info_rcfile | month            | text
+     default | sales_info_rcfile | number_of_orders | int4
+     default | sales_info_rcfile | total_sales      | float8
     ```
 
-    The following statement returns table descriptions from the default database.
+- The following statement returns table descriptions from the default database.
 
     ``` sql
     postgres=# select * from pxf_get_item_fields('Hive','default.*');
     ```
-    
-    ``` pre
-      path   | itemname |  fieldname   | fieldtype 
-    ---------+----------+--------------+-----------
-     default | test     | name         | text
-     default | test     | type         | text
-     default | test     | supplier_key | int4
-     default | test     | full_price   | float8
-     default | testabc  | name         | text
-     default | testabc  | type         | text
-    (6 rows)
-    ```
 
-    The following statement returns a description of the entire schema.
+- The following statement returns a description of the entire schema.
 
     ``` sql
     postgres=# select * from pxf_get_item_fields('Hive', '*.*');
     ```
-    
-    ``` pre
-      path   | itemname |  fieldname   | fieldtype 
-    ---------+----------+--------------+-----------
-     default | test     | name         | text
-     default | test     | type         | text
-     default | test     | supplier_key | int4
-     default | test     | full_price   | float8
-     default | testabc  | name         | text
-     default | testabc  | type         | text
-     userdb  | test     | username     | text
-     userdb  | test     | address      | text
-    (8 rows)
-    ```
 
 ### <a id="topic_r5k_pst_25"></a>Limitations
 
 HCatalog integration has the following limitations:
 
--   HCatalog integration queries and describe commands do not support complex types; only primitive types are supported. Use PXF external tables to query complex types in Hive instead. (See [Hive Complex Types](#topic_b4v_g3n_25) for example.)
+-   HCatalog integration queries and describe commands do not support complex types; only primitive types are supported. Use PXF external tables to query complex types in Hive. (See [Hive Complex Types](#topic_b4v_g3n_25) for example.)
 -   Even for primitive types, HCatalog metadata descriptions produced by `\d` and` \d+` are converted to HAWQ types. For example, the Hive type `tinyint` is converted to HAWQ type `int2`. (See [Data Types](../reference/HAWQDataTypes.html) for a list of data types in HAWQ.)
--   HAWQ reserves the database name "hcatalog" for system use. You cannot connect to or alter the system "hcatalog" database.
+-   HAWQ reserves the database name `hcatalog` for system use. You cannot connect to or alter the system `hcatalog` database.
 
 ## <a id="partitionfiltering"></a>Partition Filtering
 
-The PXF Hive plug-in uses the Hive partitioning feature and directory structure. This enables partition exclusion on HDFS files that�contain the Hive table.�To use�the partition filtering�feature to reduce network traffic and I/O, run a PXF query using a WHERE clause�that refers to a specific partition in the partitioned Hive table.
+The PXF Hive plug-in supports the Hive partitioning feature and directory structure. This enables partition exclusion on selected HDFS files comprising the Hive table.�To use�the partition filtering�feature to reduce network traffic and I/O, run a PXF query using a WHERE clause�that refers to a specific partition in the partitioned Hive table.
 
-To take advantage of PXF partition filtering push-down, name the partition fields in the external table. These names must be the same as the names stored in the Hive table. Otherwise, PXF ignores Partition filtering and the filtering is performed on the HAWQ side, impacting�performance.
+To take advantage of PXF partition filtering push-down, the Hive and PXF partition field names should be the same. Otherwise, PXF ignores Partition filtering and the filtering is performed on the HAWQ side, impacting�performance.
 
 **Note:** The Hive plug-in only filters on partition columns, not on other table attributes.
 
@@ -305,27 +522,28 @@ To take advantage of PXF partition filtering push-down, name the partition field
 Create a�Hive table `sales_part`�with two partition columns, `delivery_state` and `delivery_city:`
 
 ``` sql
-hive> CREATE TABLE sales_part (name STRING, type STRING, supplier_key INT, price DOUBLE)
-PARTITIONED BY (delivery_state STRING, delivery_city STRING)
-ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
+hive> CREATE TABLE sales_part (name string, type string, supplier_key int, price double)
+        PARTITIONED BY (delivery_state string, delivery_city string)
+        ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 ```
 
 Load data into this Hive table and�add some partitions:
 
 ``` sql
-hive> LOAD DATA LOCAL INPATH '/local/path/data1.txt' INTO TABLE sales_part PARTITION(delivery_state = 'CALIFORNIA', delivery_city = 'San Francisco');
-hive> LOAD DATA LOCAL INPATH '/local/path/data2.txt' INTO TABLE sales_part PARTITION(delivery_state = 'CALIFORNIA', delivery_city = 'Sacramento');
-hive> LOAD DATA LOCAL INPATH '/local/path/data3.txt' INTO TABLE sales_part PARTITION(delivery_state = 'NEVADA'    , delivery_city = 'Reno');
-hive> LOAD DATA LOCAL INPATH '/local/path/data4.txt' INTO TABLE sales_part PARTITION(delivery_state = 'NEVADA'    , delivery_city = 'Las Vegas');
+INSERT INTO TABLE sales_part PARTITION(delivery_state = 'CALIFORNIA', delivery_city = 'Fresno') VALUES ('block', 'widget', 33, 15.17);
+INSERT INTO TABLE sales_part PARTITION(delivery_state = 'CALIFORNIA', delivery_city = 'Sacramento') VALUES ('cube', 'widget', 11, 1.17);
+INSERT INTO TABLE sales_part PARTITION(delivery_state = 'NEVADA', delivery_city = 'Reno') VALUES ('dowel', 'widget', 51, 31.82);
+INSERT INTO TABLE sales_part PARTITION(delivery_state = 'NEVADA', delivery_city = 'Las Vegas') VALUES ('px49', 'pipe', 52, 99.82);
 ```
 
-The Hive storage directory should appears as follows:
+The Hive storage directory structure appears as follows:
 
 ``` pre
-/hive/warehouse/sales_part/delivery_state=CALIFORNIA/delivery_city=\u2019San Francisco\u2019/data1.txt
-/hive/warehouse/sales_part/delivery_state=CALIFORNIA/delivery_city=Sacramento/data2.txt
-/hive/warehouse/sales_part/delivery_state=NEVADA/delivery_city=Reno/data3.txt
-/hive/warehouse/sales_part/delivery_state=NEVADA/delivery_city=\u2019Las Vegas\u2019/data4.txt
+$ sudo -u hdfs hdfs dfs -ls -R /apps/hive/warehouse/sales_part
+/apps/hive/warehouse/sales_part/delivery_state=CALIFORNIA/delivery_city=\u2019Fresno\u2019/
+/apps/hive/warehouse/sales_part/delivery_state=CALIFORNIA/delivery_city=Sacramento/
+/apps/hive/warehouse/sales_part/delivery_state=NEVADA/delivery_city=Reno/
+/apps/hive/warehouse/sales_part/delivery_state=NEVADA/delivery_city=\u2019Las Vegas\u2019/
 ```
 
 To define a HAWQ PXF table that will read this Hive table�and�take advantage of partition filter push-down, define the fields corresponding to the Hive partition fields at the end of the attribute list.�In HiveQL,�a�`select                         *`�statement on a partitioned table shows the partition fields at the end of the record.
@@ -339,7 +557,7 @@ postgres=# CREATE EXTERNAL TABLE pxf_sales_part(
   delivery_state TEXT, 
   delivery_city TEXT
 )
-LOCATION ('pxf://namenode_host:51200/sales_part?Profile=Hive')
+LOCATION ('pxf://namenode:51200/sales_part?Profile=Hive')
 FORMAT 'custom' (FORMATTER='pxfwritable_import');
 
 postgres=# SELECT * FROM pxf_sales_part;
@@ -350,7 +568,7 @@ postgres=# SELECT * FROM pxf_sales_part;
 In the following example, the HAWQ query filters the `delivery_city` partition `Sacramento`. The filter on� `item_name` is not pushed down, since it is not a partition column. It is performed on the HAWQ side after all the data on `Sacramento` is transferred for processing.
 
 ``` sql
-postgres=# SELECT * FROM pxf_sales_part WHERE delivery_city = 'Sacramento' AND item_name = 'shirt';
+postgres=# SELECT * FROM pxf_sales_part WHERE delivery_city = 'Sacramento' AND item_name = 'cube';
 ```
 
 ### <a id="example4"></a>Example
@@ -372,17 +590,17 @@ Similar to Hive, PXF represents a table's partitioning columns as columns that a
 Consider a Hive partitioned table that is created with the statement:
 
 ``` sql
-hive> CREATE TABLE sales (order_id bigint, order_amount float) PARTITIONED BY (date date);
+hive> CREATE TABLE sales (order_id bigint, order_amount float) PARTITIONED BY (xdate date);
 ```
 
 The table is loaded with five rows that contain the following data:
 
 ``` pre
-1    1.0    1900-01-01
-2    2.2    1994-04-14
-3    3.3    2011-03-31
-4    4.5    NULL
-5    5.0    2013-12-06
+1.0    1900-01-01
+2.2    1994-04-14
+3.3    2011-03-31
+4.5    NULL
+5.0    2013-12-06
 ```
 
 In this case, the insertion of row 4 creates a Hive default partition, because the partition column "date" contains a null value.
@@ -390,7 +608,7 @@ In this case, the insertion of row 4 creates a Hive default partition, because t
 In Hive, any query that filters on the partition column omits data in the default partition. For example the following query returns no rows:
 
 ``` sql
-hive> select * from sales where date is null;
+hive> select * from sales where xdate is null;
 ```
 
 However, if you map this table as a PXF external table in HAWQ, all default partition values are translated into actual NULL values. In HAWQ, executing the same query against the PXF table returns row 4 as the result, because the filter matches the NULL value.
@@ -399,19 +617,18 @@ Keep this behavior in mind when executing IS NULL queries on Hive partitioned ta
 
 ## <a id="topic_dbb_nz3_ts"></a>Accessing Hive Tables in Parquet Format
 
-The PXF Hive profile supports both non-partitioned and partitioned Hive tables that use the Parquet storage format in HDFS. Simply map the table columns using equivalent HAWQ data types. For example, in Hive if a table is created using:
+The PXF Hive profile supports both non-partitioned and partitioned Hive tables that use the Parquet storage format in HDFS. Simply map the table columns using equivalent HAWQ data types. For example, if a Hivw table is created using:
 
 ``` sql
-hive> create table hive_parquet_table (fname string, lname string, custid int, acctbalance double) stored as
-      parquet;
+hive> CREATE TABLE hive_parquet_table (fname string, lname string, custid int, acctbalance double) 
+        STORED AS parquet;
 ```
 
-Then you would define the HAWQ external table using:
+Define the HAWQ external table using:
 
 ``` sql
-postgres=# create external table pxf_parquet_table (fname text, lname text, custid int, acctbalance double precision) 
-    location ('pxf://localhost:51200/hive-db-name.hive_parquet_table?profile=Hive') 
-    format 'custom' (formatter='pxfwritable_import'); 
+postgres=# CREATE EXTERNAL TABLE pxf_parquet_table (fname text, lname text, custid int, acctbalance double precision) 
+    LOCATION ('pxf://namenode:51200/hive-db-name.hive_parquet_table?profile=Hive') 
+    FORMAT 'CUSTOM' (formatter='pxfwritable_import'); 
 ```
 
-



[02/14] incubator-hawq-docs git commit: PXF Hive plug-in mods

Posted by yo...@apache.org.
PXF Hive plug-in mods


Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/457d703a
Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/457d703a
Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/457d703a

Branch: refs/heads/develop
Commit: 457d703a3f5c057e241acf985fbc35da34f6a075
Parents: 0398a62
Author: Lisa Owen <lo...@pivotal.io>
Authored: Mon Sep 26 15:40:10 2016 -0700
Committer: Lisa Owen <lo...@pivotal.io>
Committed: Mon Sep 26 15:40:10 2016 -0700

----------------------------------------------------------------------
 pxf/HivePXF.html.md.erb | 141 ++++++++++++++++++++++---------------------
 1 file changed, 72 insertions(+), 69 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/457d703a/pxf/HivePXF.html.md.erb
----------------------------------------------------------------------
diff --git a/pxf/HivePXF.html.md.erb b/pxf/HivePXF.html.md.erb
index 55a0aeb..a1c26c3 100644
--- a/pxf/HivePXF.html.md.erb
+++ b/pxf/HivePXF.html.md.erb
@@ -61,14 +61,14 @@ The following table summarizes external mapping rules for Hive primitive types.
 
 ### <a id="topic_b4v_g3n_25"></a>Complex Data Types
 
-Hive supports complex data types including array, struct, map, and union. PXF maps each of these complex types to `text`.  While HAWQ does not natively support these types, you can create HAWQ functions or application code to extract components of complex data types.
+Hive supports complex data types including array, struct, map, and union. PXF maps each of these complex types to `text`.  While HAWQ does not natively support these types, you can create HAWQ functions or application code to extract subcomponents of these complex data types.
 
-A complex data type example is provided later in this section.
+A complex data type example is provided later in this topic.
 
 
 ## <a id="hive_sampledataset"></a>Sample Data Set
 
-Examples used in this section will operate on a common data set. This simple data set models a retail sales operation and includes fields with the following names and data types:
+Examples used in this topic will operate on a common data set. This simple data set models a retail sales operation and includes fields with the following names and data types:
 
 - location - text
 - month - text
@@ -117,10 +117,10 @@ hive> CREATE TABLE sales_info (location string, month string,
 
 Notice:
 
-- The `STORED AS textfile` subclause instructs Hive to create the table in TextFile (the default) format.  Hive textfile format supports comma-, tab-, and space-separated values, as well as data specified in JSON notation.
+- The `STORED AS textfile` subclause instructs Hive to create the table in Textfile (the default) format.  Hive Textfile format supports comma-, tab-, and space-separated values, as well as data specified in JSON notation.
 - The `DELIMITED FIELDS TERMINATED BY` subclause identifies the field delimiter within a data record (line). The `sales_info` table field delimiter is a comma (`,`).
 
-Load the `pxf_hive_datafile.txt` sample data file into the newly-created `sales_info` table:
+Load the `pxf_hive_datafile.txt` sample data file into the `sales_info` table you just created:
 
 ``` sql
 hive> LOAD DATA local INPATH '/tmp/pxf_hive_datafile.txt' 
@@ -137,6 +137,8 @@ In examples later in this section, you will access the `sales_info` Hive table d
 
 ## <a id="topic_p2s_lvl_25"></a>Querying External Hive Data
 
+The PXF Hive plug-in supports several Hive-related profiles. These include `Hive`, `HiveText`, and `HiveRC`.
+
 Use the following syntax to create a HAWQ external table representing Hive data:
 
 ``` sql
@@ -160,7 +162,7 @@ Hive-plug-in-specific keywords and values used in the `CREATE EXTERNAL TABLE` ca
 | FORMAT (`HiveText` and `HiveRC` profiles) | The `FORMAT` clause must specify `TEXT`. The `delimiter` must be specified a second time in `delim`. |
 
 
-## <a id="profile_hive"></a>Example: Hive Profile 
+### <a id="profile_hive"></a>Hive Profile 
 
 The `Hive` profile works with any Hive file format table.
 
@@ -190,7 +192,7 @@ postgres=# select * from salesinfo_hiveprofile;
 
 ```
 
-## <a id="profile_hivetext"></a>Example: HiveText Profile
+### <a id="profile_hivetext"></a>HiveText Profile
 
 Use the `HiveText` profile to query text formats. The `HiveText` profile is more performant than the `Hive` profile. 
 
@@ -214,7 +216,7 @@ Notice:
 Query the external table:
 
 ``` sql
-postgres=# select * from salesinfo_hivetextprofile where location="Beijing"; 
+postgres=# SELECT * FROM salesinfo_hivetextprofile where location="Beijing"; 
 ```
 
 ``` shell                     
@@ -225,9 +227,9 @@ postgres=# select * from salesinfo_hivetextprofile where location="Beijing";
 (2 rows)
 ```
 
-## <a id="profile_hiverc"></a>Example: HiveRC Profile
+### <a id="profile_hiverc"></a>HiveRC Profile
 
-### <a id="profile_hiverc_rcfiletbl"></a>RCFile Format Hive Table
+#### <a id="profile_hiverc_rcfiletbl"></a>RCFile Format Hive Table
 
 The RCFile Hive format is used for row columnar formatted data. Create a Hive table with RCFile format:
 
@@ -253,7 +255,7 @@ A copy of the sample data set is now stored in RCFile format in `sales_info_rcfi
 ``` sql
 hive> SELECT * FROM sales_info_rcfile;
 ```
-### <a id="profile_hiverc_pxfquery"></a>PXF Query
+#### <a id="profile_hiverc_pxfquery"></a>PXF Query
 
 Use the `HiveRC` profile to query RCFile-formatted data in Hive tables. The `HiveRC` profile is more performant than the `Hive` profile. 
 
@@ -272,7 +274,7 @@ postgres=# CREATE EXTERNAL TABLE salesinfo_hivercprofile(location text, month te
 Query the external table:
 
 ``` sql
-postgres=# select location, total_sales from salesinfo_hivercprofile; 
+postgres=# SELECT location, total_sales FROM salesinfo_hivercprofile; 
 ```
 
 ``` shell                     
@@ -285,22 +287,39 @@ postgres=# select location, total_sales from salesinfo_hivercprofile;
  ...
 ```
 
-## <a id="profileperf"></a>Profile Performance Considerations
+### <a id="topic_dbb_nz3_ts"></a>Accessing Parquet-Format Hive Tables
+
+The PXF `Hive` profile supports both non-partitioned and partitioned Hive tables that use the Parquet storage format in HDFS. Simply map the table columns using equivalent HAWQ data types. For example, if a Hive table is created using:
+
+``` sql
+hive> CREATE TABLE hive_parquet_table (fname string, lname string, custid int, acctbalance double) 
+        STORED AS parquet;
+```
+
+Define the HAWQ external table using:
+
+``` sql
+postgres=# CREATE EXTERNAL TABLE pxf_parquet_table (fname text, lname text, custid int, acctbalance double precision) 
+    LOCATION ('pxf://namenode:51200/hive-db-name.hive_parquet_table?profile=Hive') 
+    FORMAT 'CUSTOM' (formatter='pxfwritable_import'); 
+```
+
+### <a id="profileperf"></a>Profile Performance Considerations
 
 The `HiveRC` and `HiveText` profiles are faster than the generic `Hive` profile.
 
 ?? MORE HERE. ??
 
-## <a id="complex_dt_example"></a>Example: Complex Data Types
+### <a id="complex_dt_example"></a>Complex Data Types Example
 
-Prepare a sample data set for use. This data set will employ the array and map complex types, specifically an array of integers and a key/value pair map. 
+Prepare a sample data set for use. This data set will employ the array and map complex types, specifically an array of integers and a string key/value pair map. 
 
 The data set includes fields with the following names and data types:
 
 - index - int
 - name - string
 - intarray - array of integers
-- propmap - map of string keys and string value pairs
+- propmap - map of string key and value pairs
 
 When specifying an array field in a Hive table, you must identify the terminator for each item in the collection. Similarly, the map key termination character must also be specified.
 
@@ -332,20 +351,20 @@ $ HADOOP_USER_NAME=hdfs hive
 ```
 
 ``` sql
-CREATE TABLE table_complextypes( index int, name string, intarray ARRAY<INT>, propmap MAP<string, string>) 
-  ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
-  COLLECTION ITEMS TERMINATED BY '%' 
-  MAP KEYS TERMINATED BY ':' 
-  STORED AS TEXTFILE;
+postgres=# CREATE TABLE table_complextypes( index int, name string, intarray ARRAY<int>, propmap MAP<string, string>) 
+             ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
+             COLLECTION ITEMS TERMINATED BY '%' 
+             MAP KEYS TERMINATED BY ':' 
+             STORED AS TEXTFILE;
 ```
 
 Notice:
 
-- The `FIELDS TERMINATED BY` identifies a comma as the field terminator.
-- The `COLLECTION ITEMS TERMINATED BY` specifies the percent sign as the collection items (array item, map key/value pair) terminator.
-- The `MAP KEYS TERMINATED BY` ientifies a colon as the terminator for map keys.
+- `FIELDS TERMINATED BY` identifies a comma as the field terminator.
+- The `COLLECTION ITEMS TERMINATED BY` subclause specifies the percent sign as the collection items (array item, map key/value pair) terminator.
+- `MAP KEYS TERMINATED BY` identifies a colon as the terminator for map keys.
 
-Load the `pxf_hive_complex.txt` sample data file into the newly-created `table_complextypes` table:
+Load the `pxf_hive_complex.txt` sample data file into the `table_complextypes` table you just created:
 
 ``` sql
 hive> LOAD DATA local INPATH '/tmp/pxf_hive_complex.txt' INTO TABLE table_complextypes;
@@ -364,7 +383,7 @@ hive> SELECT * FROM table_complextypes;
 ...
 ```
 
-Use the PXF `Hive` profile to create a queryable HAWQ external table representing the Hive table_complextypes:
+Use the PXF `Hive` profile to create a queryable HAWQ external table representing the Hive `table_complextypes`:
 
 ``` sql
 postgres=# CREATE EXTERNAL TABLE complextypes_hiveprofile(index int, name text, intarray text, propmap text)
@@ -372,12 +391,12 @@ postgres=# CREATE EXTERNAL TABLE complextypes_hiveprofile(index int, name text,
              FORMAT 'CUSTOM' (formatter='pxfwritable_import');
 ```
 
-Notice that the integer array and map complex types are mapped to PXF type text.
+Notice that the integer array and map complex types are mapped to type text.
 
 Query the external table:
 
 ``` sql
-select * from complextypes_hiveprofile; 
+postgres=# SELECT * FROM complextypes_hiveprofile; 
 ```
 
 ``` shell                      
@@ -396,6 +415,7 @@ select * from complextypes_hiveprofile;
 (10 rows)
 ```
 
+`intarray` and `propmap` are each text strings.
 
 ## <a id="hcatalog"></a>Using PXF and HCatalog to Query Hive
 
@@ -458,9 +478,11 @@ To obtain a description of a Hive table with HCatalog integration, you can use t
     
     ``` sql
     postgres=# \d hcatalog.default.sales_info_rcfile;
-
+    ```
+    
+    ``` shell
     PXF Hive Table "default.sales_info_rcfile"
-              Column      |  Type  
+          Column      |  Type  
     ------------------+--------
      location         | text
      month            | text
@@ -472,12 +494,12 @@ To obtain a description of a Hive table with HCatalog integration, you can use t
 
 **Note:** When using `\d` or `\d+` commands in the `psql` HAWQ client, `hcatalog` will not be listed as a database. If you use other `psql` compatible clients, `hcatalog` will be listed as a database with a size value of `-1` since `hcatalog` is not a real database in HAWQ.
 
-Alternatively, you can use the `pxf\_get\_item\_fields` user-defined function (UDF) to obtain Hive table descriptions from other client interfaces or third-party applications. The UDF takes a PXF profile and a table pattern string as its input parameters.  **Note:** The only supported input profile at this time is `'Hive'`.
+Alternatively, you can use the `pxf_get_item_fields` user-defined function (UDF) to obtain Hive table descriptions from other client interfaces or third-party applications. The UDF takes a PXF profile and a table pattern string as its input parameters.  **Note:** The only supported input profile at this time is `'Hive'`.
 
 - The following statement returns a description of a specific table. The description includes path, itemname (table), fieldname, and fieldtype.
 
     ``` sql
-    postgres=# select * from pxf_get_item_fields('Hive','default.sales_info_rcfile');
+    postgres=# SELECT * FROM pxf_get_item_fields('Hive','default.sales_info_rcfile');
     ```
     
     ``` pre
@@ -492,32 +514,32 @@ Alternatively, you can use the `pxf\_get\_item\_fields` user-defined function (U
 - The following statement returns table descriptions from the default database.
 
     ``` sql
-    postgres=# select * from pxf_get_item_fields('Hive','default.*');
+    postgres=# SELECT * FROM pxf_get_item_fields('Hive','default.*');
     ```
 
 - The following statement returns a description of the entire schema.
 
     ``` sql
-    postgres=# select * from pxf_get_item_fields('Hive', '*.*');
+    postgres=# SELECT * FROM pxf_get_item_fields('Hive', '*.*');
     ```
 
 ### <a id="topic_r5k_pst_25"></a>Limitations
 
 HCatalog integration has the following limitations:
 
--   HCatalog integration queries and describe commands do not support complex types; only primitive types are supported. Use PXF external tables to query complex types in Hive. (See [Hive Complex Types](#topic_b4v_g3n_25) for example.)
--   Even for primitive types, HCatalog metadata descriptions produced by `\d` and` \d+` are converted to HAWQ types. For example, the Hive type `tinyint` is converted to HAWQ type `int2`. (See [Data Types](../reference/HAWQDataTypes.html) for a list of data types in HAWQ.)
+-   HCatalog integration queries and describe commands do not support complex types; only primitive types are supported. Use PXF external tables to query complex types in Hive. (See [Complex Types Example](#complex_dt_example).)
+-   Even for primitive types, HCatalog metadata descriptions produced by `\d` and` \d+` are converted to HAWQ types. For example, the Hive type `tinyint` is converted to HAWQ type `int2`. (See [Data Type Mapping](#hive_primdatatypes).)
 -   HAWQ reserves the database name `hcatalog` for system use. You cannot connect to or alter the system `hcatalog` database.
 
 ## <a id="partitionfiltering"></a>Partition Filtering
 
-The PXF Hive plug-in supports the Hive partitioning feature and directory structure. This enables partition exclusion on selected HDFS files comprising the Hive table.�To use�the partition filtering�feature to reduce network traffic and I/O, run a PXF query using a WHERE clause�that refers to a specific partition in the partitioned Hive table.
+The PXF Hive plug-in supports the Hive partitioning feature and directory structure. This enables partition exclusion on selected HDFS files comprising the Hive table.�To use�the partition filtering�feature to reduce network traffic and I/O, run a PXF query using a `WHERE` clause�that refers to a specific partition in the partitioned Hive table.
 
-To take advantage of PXF partition filtering push-down, the Hive and PXF partition field names should be the same. Otherwise, PXF ignores Partition filtering and the filtering is performed on the HAWQ side, impacting�performance.
+To take advantage of PXF partition filtering push-down, the Hive and PXF partition field names should be the same. Otherwise, PXF ignores partition filtering and the filtering is performed on the HAWQ side, impacting�performance.
 
 **Note:** The Hive plug-in only filters on partition columns, not on other table attributes.
 
-### <a id="example2"></a>Example
+### <a id="example2"></a>Create Partitioned Hive Table
 
 Create a�Hive table `sales_part`�with two partition columns, `delivery_state` and `delivery_city:`
 
@@ -530,13 +552,13 @@ hive> CREATE TABLE sales_part (name string, type string, supplier_key int, price
 Load data into this Hive table and�add some partitions:
 
 ``` sql
-INSERT INTO TABLE sales_part PARTITION(delivery_state = 'CALIFORNIA', delivery_city = 'Fresno') VALUES ('block', 'widget', 33, 15.17);
-INSERT INTO TABLE sales_part PARTITION(delivery_state = 'CALIFORNIA', delivery_city = 'Sacramento') VALUES ('cube', 'widget', 11, 1.17);
-INSERT INTO TABLE sales_part PARTITION(delivery_state = 'NEVADA', delivery_city = 'Reno') VALUES ('dowel', 'widget', 51, 31.82);
-INSERT INTO TABLE sales_part PARTITION(delivery_state = 'NEVADA', delivery_city = 'Las Vegas') VALUES ('px49', 'pipe', 52, 99.82);
+hive> INSERT INTO TABLE sales_part PARTITION(delivery_state = 'CALIFORNIA', delivery_city = 'Fresno') VALUES ('block', 'widget', 33, 15.17);
+hive> INSERT INTO TABLE sales_part PARTITION(delivery_state = 'CALIFORNIA', delivery_city = 'Sacramento') VALUES ('cube', 'widget', 11, 1.17);
+hive> INSERT INTO TABLE sales_part PARTITION(delivery_state = 'NEVADA', delivery_city = 'Reno') VALUES ('dowel', 'widget', 51, 31.82);
+hive> INSERT INTO TABLE sales_part PARTITION(delivery_state = 'NEVADA', delivery_city = 'Las Vegas') VALUES ('px49', 'pipe', 52, 99.82);
 ```
 
-The Hive storage directory structure appears as follows:
+The Hive storage directory structure for the `sales_part` table appears as follows:
 
 ``` pre
 $ sudo -u hdfs hdfs dfs -ls -R /apps/hive/warehouse/sales_part
@@ -546,7 +568,7 @@ $ sudo -u hdfs hdfs dfs -ls -R /apps/hive/warehouse/sales_part
 /apps/hive/warehouse/sales_part/delivery_state=NEVADA/delivery_city=\u2019Las Vegas\u2019/
 ```
 
-To define a HAWQ PXF table that will read this Hive table�and�take advantage of partition filter push-down, define the fields corresponding to the Hive partition fields at the end of the attribute list.�In HiveQL,�a�`select                         *`�statement on a partitioned table shows the partition fields at the end of the record.
+To define a HAWQ PXF table that will read this Hive table�and�take advantage of partition filter push-down, define the fields corresponding to the Hive partition fields at the end of the `CREATE EXTERNAL TABLE` attribute list.�In HiveQL,�a�`SELECT *`�statement on a partitioned table shows the partition fields at the end of the record.
 
 ``` sql
 postgres=# CREATE EXTERNAL TABLE pxf_sales_part(
@@ -563,7 +585,7 @@ FORMAT 'custom' (FORMATTER='pxfwritable_import');
 postgres=# SELECT * FROM pxf_sales_part;
 ```
 
-### <a id="example3"></a>Example
+### <a id="example3"></a>Query Without Pushdown
 
 In the following example, the HAWQ query filters the `delivery_city` partition `Sacramento`. The filter on� `item_name` is not pushed down, since it is not a partition column. It is performed on the HAWQ side after all the data on `Sacramento` is transferred for processing.
 
@@ -571,7 +593,7 @@ In the following example, the HAWQ query filters the `delivery_city` partition `
 postgres=# SELECT * FROM pxf_sales_part WHERE delivery_city = 'Sacramento' AND item_name = 'cube';
 ```
 
-### <a id="example4"></a>Example
+### <a id="example4"></a>Query With Pushdown
 
 The following HAWQ query reads all the data under�`delivery_state` partition `CALIFORNIA`, regardless of the city.
 
@@ -585,8 +607,6 @@ This topic describes a difference in query results between Hive and PXF queries
 
 Similar to Hive, PXF represents a table's partitioning columns as columns that are appended to the end of the table. However, PXF translates any column value in a default partition to a NULL value. This means that a HAWQ query that includes an IS NULL filter on a partitioning column can return different results than the same Hive query.
 
-### <a id="topic_g4r_4wh_1s"></a>Example
-
 Consider a Hive partitioned table that is created with the statement:
 
 ``` sql
@@ -603,32 +623,15 @@ The table is loaded with five rows that contain the following data:
 5.0    2013-12-06
 ```
 
-In this case, the insertion of row 4 creates a Hive default partition, because the partition column "date" contains a null value.
+The insertion of row 4 creates a Hive default partition, because the partition column `xdate` contains a null value.
 
-In Hive, any query that filters on the partition column omits data in the default partition. For example the following query returns no rows:
+In Hive, any query that filters on the partition column omits data in the default partition. For example, the following query returns no rows:
 
 ``` sql
-hive> select * from sales where xdate is null;
+hive> SELECT * FROM sales WHERE xdate is null;
 ```
 
 However, if you map this table as a PXF external table in HAWQ, all default partition values are translated into actual NULL values. In HAWQ, executing the same query against the PXF table returns row 4 as the result, because the filter matches the NULL value.
 
 Keep this behavior in mind when executing IS NULL queries on Hive partitioned tables.
 
-## <a id="topic_dbb_nz3_ts"></a>Accessing Hive Tables in Parquet Format
-
-The PXF Hive profile supports both non-partitioned and partitioned Hive tables that use the Parquet storage format in HDFS. Simply map the table columns using equivalent HAWQ data types. For example, if a Hivw table is created using:
-
-``` sql
-hive> CREATE TABLE hive_parquet_table (fname string, lname string, custid int, acctbalance double) 
-        STORED AS parquet;
-```
-
-Define the HAWQ external table using:
-
-``` sql
-postgres=# CREATE EXTERNAL TABLE pxf_parquet_table (fname text, lname text, custid int, acctbalance double precision) 
-    LOCATION ('pxf://namenode:51200/hive-db-name.hive_parquet_table?profile=Hive') 
-    FORMAT 'CUSTOM' (formatter='pxfwritable_import'); 
-```
-


[10/14] incubator-hawq-docs git commit: note/notice

Posted by yo...@apache.org.
note/notice


Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/2a38a032
Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/2a38a032
Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/2a38a032

Branch: refs/heads/develop
Commit: 2a38a0322abda804cfd4fc8aa39f142f0d83ea11
Parents: 284c3ec
Author: Lisa Owen <lo...@pivotal.io>
Authored: Wed Oct 26 10:20:28 2016 -0700
Committer: Lisa Owen <lo...@pivotal.io>
Committed: Wed Oct 26 10:20:28 2016 -0700

----------------------------------------------------------------------
 pxf/HivePXF.html.md.erb | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/2a38a032/pxf/HivePXF.html.md.erb
----------------------------------------------------------------------
diff --git a/pxf/HivePXF.html.md.erb b/pxf/HivePXF.html.md.erb
index 82fcc25..da851db 100644
--- a/pxf/HivePXF.html.md.erb
+++ b/pxf/HivePXF.html.md.erb
@@ -130,7 +130,7 @@ Create a Hive table to expose our sample data set.
             STORED AS textfile;
     ```
 
-    Notice:
+    Notice that:
 
     - The `STORED AS textfile` subclause instructs Hive to create the table in Textfile (the default) format.  Hive Textfile format supports comma-, tab-, and space-separated values, as well as data specified in JSON notation.
     - The `DELIMITED FIELDS TERMINATED BY` subclause identifies the field delimiter within a data record (line). The `sales_info` table field delimiter is a comma (`,`).
@@ -229,7 +229,7 @@ Use the PXF `HiveText` profile to create a queryable HAWQ external table from th
 
     (You can safely ignore the "nonstandard use of escape in a string literal" warning and related messages.)
 
-    Notice:
+    Notice that:
 
     - The `LOCATION` subclause `DELIMITER` value is specified in hexadecimal format. `\x` is a prefix that instructs PXF to interpret the following characters as hexadecimal. `2c` is the hex value for the comma character.
     - The `FORMAT` subclause `delimiter` value is specified as the single ascii comma character ','. `E` escapes the character.
@@ -380,7 +380,7 @@ When specifying an array field in a Hive table, you must identify the terminator
              STORED AS TEXTFILE;
     ```
 
-    Notice:
+    Notice that:
 
     - `FIELDS TERMINATED BY` identifies a comma as the field terminator.
     - The `COLLECTION ITEMS TERMINATED BY` subclause specifies the percent sign as the collection items (array item, map key/value pair) terminator.
@@ -514,7 +514,7 @@ To obtain a description of a Hive table with HCatalog integration, you can use t
 -   Use `\d hcatalog.hive-db-name.*` to describe the whole database schema, i.e. all tables in `hive-db-name`.
 -   Use `\d hcatalog.*.*` to describe the whole schema, i.e. all databases and tables.
 
-**Note:** When using `\d` or `\d+` commands in the `psql` HAWQ client, `hcatalog` will not be listed as a database. If you use other `psql` compatible clients, `hcatalog` will be listed as a database with a size value of `-1` since `hcatalog` is not a real database in HAWQ.
+When using `\d` or `\d+` commands in the `psql` HAWQ client, `hcatalog` will not be listed as a database. If you use other `psql` compatible clients, `hcatalog` will be listed as a database with a size value of `-1` since `hcatalog` is not a real database in HAWQ.
 
 Alternatively, you can use the `pxf_get_item_fields` user-defined function (UDF) to obtain Hive table descriptions from other client interfaces or third-party applications. The UDF takes a PXF profile and a table pattern string as its input parameters.  **Note:** The only supported input profile at this time is `'Hive'`.
 
@@ -559,7 +559,7 @@ The PXF Hive plug-in supports the Hive partitioning feature and directory struct
 
 To take advantage of PXF partition filtering push-down, the Hive and PXF partition field names should be the same. Otherwise, PXF ignores partition filtering and the filtering is performed on the HAWQ side, impacting�performance.
 
-**Note:** The Hive plug-in only filters on partition columns, not on other table attributes.
+**Note:** The Hive plug-in filters only on partition columns, not on other table attributes.
 
 ### <a id="example2"></a>Create Partitioned Hive Table
 


[05/14] incubator-hawq-docs git commit: link to CREATE EXTERNAL TABLE ref page

Posted by yo...@apache.org.
link to CREATE EXTERNAL TABLE ref page


Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/150fa678
Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/150fa678
Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/150fa678

Branch: refs/heads/develop
Commit: 150fa67857871d58ea05eb14c023215c932ab7b1
Parents: 8c986b6
Author: Lisa Owen <lo...@pivotal.io>
Authored: Tue Oct 11 12:03:39 2016 -0700
Committer: Lisa Owen <lo...@pivotal.io>
Committed: Tue Oct 11 12:03:39 2016 -0700

----------------------------------------------------------------------
 pxf/HivePXF.html.md.erb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/150fa678/pxf/HivePXF.html.md.erb
----------------------------------------------------------------------
diff --git a/pxf/HivePXF.html.md.erb b/pxf/HivePXF.html.md.erb
index 581377f..1cce751 100644
--- a/pxf/HivePXF.html.md.erb
+++ b/pxf/HivePXF.html.md.erb
@@ -155,7 +155,7 @@ LOCATION ('pxf://host[:port]/hive-db-name.hive-table-name
 FORMAT 'CUSTOM|TEXT' (formatter='pxfwritable_import' | delimiter='delim')
 ```
 
-Hive-plug-in-specific keywords and values used in the `CREATE EXTERNAL TABLE` call are described below.
+Hive-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described below.
 
 | Keyword  | Value |
 |-------|-------------------------------------|


[03/14] incubator-hawq-docs git commit: clean up some extra whitespace

Posted by yo...@apache.org.
clean up some extra whitespace


Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/822d7545
Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/822d7545
Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/822d7545

Branch: refs/heads/develop
Commit: 822d7545e746490e55507866c62dca5ea2d5349a
Parents: 457d703
Author: Lisa Owen <lo...@pivotal.io>
Authored: Mon Oct 3 15:19:03 2016 -0700
Committer: Lisa Owen <lo...@pivotal.io>
Committed: Mon Oct 3 15:19:03 2016 -0700

----------------------------------------------------------------------
 pxf/HivePXF.html.md.erb | 98 ++++++++++++++++++++++----------------------
 1 file changed, 49 insertions(+), 49 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/822d7545/pxf/HivePXF.html.md.erb
----------------------------------------------------------------------
diff --git a/pxf/HivePXF.html.md.erb b/pxf/HivePXF.html.md.erb
index a1c26c3..e1a7bce 100644
--- a/pxf/HivePXF.html.md.erb
+++ b/pxf/HivePXF.html.md.erb
@@ -2,7 +2,7 @@
 title: Accessing Hive Data
 ---
 
-Apache Hive is a distributed data warehousing infrastructure.  Hive facilitates managing large data sets, supporting multiple data formats, including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive plug-in reads data stored in Hive, as well as HDFS or HBase. 
+Apache Hive is a distributed data warehousing infrastructure.  Hive facilitates managing large data sets, supporting multiple data formats, including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive plug-in reads data stored in Hive, as well as HDFS or HBase.
 
 This section describes how to use PXF to access Hive data. Options for querying data stored in Hive include:
 
@@ -45,13 +45,13 @@ The following table summarizes external mapping rules for Hive primitive types.
 |-------|---------------------------|
 | boolean    | bool |
 | int   | int4 |
-| smallint   | int2 |  
-| tinyint   | int2 | 
-| bigint   | int8 | 
+| smallint   | int2 |
+| tinyint   | int2 |
+| bigint   | int8 |
 | decimal  |  numeric  |
-| float   | float4 | 
-| double   | float8 | 
-| string   | text | 
+| float   | float4 |
+| double   | float8 |
+| string   | text |
 | binary   | bytea |
 | char   | bpchar |
 | varchar   | varchar |
@@ -106,12 +106,12 @@ The Hive command line is a subsystem similar to that of `psql`. To start the Hiv
 $ HADOOP_USER_NAME=hdfs hive
 ```
 
-The default Hive database is named `default`. Create a Hive table named `sales_info` to expose our sample data set. Create this table in the `default` database: 
+The default Hive database is named `default`. Create a Hive table named `sales_info` to expose our sample data set. Create this table in the `default` database:
 
 ``` sql
 hive> CREATE TABLE sales_info (location string, month string,
-        number_of_orders int, total_sales double) 
-        ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
+        number_of_orders int, total_sales double)
+        ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
         STORED AS textfile;
 ```
 
@@ -123,7 +123,7 @@ Notice:
 Load the `pxf_hive_datafile.txt` sample data file into the `sales_info` table you just created:
 
 ``` sql
-hive> LOAD DATA local INPATH '/tmp/pxf_hive_datafile.txt' 
+hive> LOAD DATA local INPATH '/tmp/pxf_hive_datafile.txt'
         INTO TABLE sales_info;
 ```
 
@@ -142,10 +142,10 @@ The PXF Hive plug-in supports several Hive-related profiles. These include `Hive
 Use the following syntax to create a HAWQ external table representing Hive data:
 
 ``` sql
-CREATE EXTERNAL TABLE table_name 
+CREATE EXTERNAL TABLE table_name
     ( column_name data_type [, ...] | LIKE other_table )
 LOCATION ('pxf://host[:port]/hive-db-name.hive-table-name
-    ?PROFILE=Hive|HiveText|HiveRC[&DELIMITER=delim']) 
+    ?PROFILE=Hive|HiveText|HiveRC[&DELIMITER=delim'])
 FORMAT 'CUSTOM|TEXT' (formatter='pxfwritable_import' | delimiter='delim')
 ```
 
@@ -156,13 +156,13 @@ Hive-plug-in-specific keywords and values used in the `CREATE EXTERNAL TABLE` ca
 | host    | The HDFS NameNode. |
 | hive-db-name    | Name of the Hive database. If omitted, defaults to the Hive database named `default`. |
 | hive-table-name    | Name of the Hive table. |
-| PROFILE    | The `PROFILE` keyword must specify one of the values `Hive`, `HiveText`, or `HiveRC`. |  
+| PROFILE    | The `PROFILE` keyword must specify one of the values `Hive`, `HiveText`, or `HiveRC`. |
 | DELIMITER    | The `DELIMITER` clause is required for both the `HiveText` and `HiveRC` profiles and identifies the field delimiter used in the Hive data set.  `delim` must be a single ascii character or specified in hexadecimal representation. |
 | FORMAT (`Hive` profile)   | The `FORMAT` clause must specify `CUSTOM`. The `CUSTOM` format supports only the built-in `pxfwritable_import` `formatter`.   |
 | FORMAT (`HiveText` and `HiveRC` profiles) | The `FORMAT` clause must specify `TEXT`. The `delimiter` must be specified a second time in `delim`. |
 
 
-### <a id="profile_hive"></a>Hive Profile 
+### <a id="profile_hive"></a>Hive Profile
 
 The `Hive` profile works with any Hive file format table.
 
@@ -179,11 +179,11 @@ postgres=# CREATE EXTERNAL TABLE salesinfo_hiveprofile(location text, month text
 ```
 
 ``` sql
-postgres=# select * from salesinfo_hiveprofile; 
+postgres=# select * from salesinfo_hiveprofile;
 ```
 
-``` shell                     
-   location    | month | num_orders | total_sales 
+``` shell
+   location    | month | num_orders | total_sales
 ---------------+-------+------------+-------------
  Prague        | Jan   |        101 |     4875.33
  Rome          | Mar   |         87 |     1557.39
@@ -194,7 +194,7 @@ postgres=# select * from salesinfo_hiveprofile;
 
 ### <a id="profile_hivetext"></a>HiveText Profile
 
-Use the `HiveText` profile to query text formats. The `HiveText` profile is more performant than the `Hive` profile. 
+Use the `HiveText` profile to query text formats. The `HiveText` profile is more performant than the `Hive` profile.
 
 **Note**: When using the `HiveText` profile, you *must* specify a delimiter option in *both* the `LOCATION` and `FORMAT` clauses.
 
@@ -202,7 +202,7 @@ Use the PXF `HiveText` profile to create a queryable HAWQ external table from th
 
 ``` sql
 postgres=# CREATE EXTERNAL TABLE salesinfo_hivetextprofile(location text, month text, num_orders int, total_sales float8)
-             LOCATION ('pxf://namenode:51200/default.sales_info?PROFILE=HiveText&DELIMITER=\x2c') 
+             LOCATION ('pxf://namenode:51200/default.sales_info?PROFILE=HiveText&DELIMITER=\x2c')
              FORMAT 'TEXT' (delimiter=E',');
 ```
 
@@ -216,11 +216,11 @@ Notice:
 Query the external table:
 
 ``` sql
-postgres=# SELECT * FROM salesinfo_hivetextprofile where location="Beijing"; 
+postgres=# SELECT * FROM salesinfo_hivetextprofile where location="Beijing";
 ```
 
-``` shell                     
- location | month | num_orders | total_sales 
+``` shell
+ location | month | num_orders | total_sales
 ----------+-------+------------+-------------
  Beijing  | Jul   |        411 |    11600.67
  Beijing  | Dec   |        100 |     4248.41
@@ -239,8 +239,8 @@ $ HADOOP_USER_NAME=hdfs hive
 
 ``` sql
 hive> CREATE TABLE sales_info_rcfile (location string, month string,
-        number_of_orders int, total_sales double) 
-        ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
+        number_of_orders int, total_sales double)
+        ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
         STORED AS rcfile;
 ```
 
@@ -257,7 +257,7 @@ hive> SELECT * FROM sales_info_rcfile;
 ```
 #### <a id="profile_hiverc_pxfquery"></a>PXF Query
 
-Use the `HiveRC` profile to query RCFile-formatted data in Hive tables. The `HiveRC` profile is more performant than the `Hive` profile. 
+Use the `HiveRC` profile to query RCFile-formatted data in Hive tables. The `HiveRC` profile is more performant than the `Hive` profile.
 
 **Note**: When using the `HiveRC` profile, you *must* specify a delimiter option in *both* the `LOCATION` and `FORMAT` clauses.
 
@@ -265,7 +265,7 @@ Use the PXF `HiveRC` profile to create a queryable HAWQ external table from the
 
 ``` sql
 postgres=# CREATE EXTERNAL TABLE salesinfo_hivercprofile(location text, month text, num_orders int, total_sales float8)
-             LOCATION ('pxf://namenode:51200/default.sales_info_rcfile?PROFILE=HiveRC&DELIMITER=\x2c') 
+             LOCATION ('pxf://namenode:51200/default.sales_info_rcfile?PROFILE=HiveRC&DELIMITER=\x2c')
              FORMAT 'TEXT' (delimiter=E',');
 ```
 
@@ -274,11 +274,11 @@ postgres=# CREATE EXTERNAL TABLE salesinfo_hivercprofile(location text, month te
 Query the external table:
 
 ``` sql
-postgres=# SELECT location, total_sales FROM salesinfo_hivercprofile; 
+postgres=# SELECT location, total_sales FROM salesinfo_hivercprofile;
 ```
 
-``` shell                     
-   location    | total_sales 
+``` shell
+   location    | total_sales
 ---------------+-------------
  Prague        |     4875.33
  Rome          |     1557.39
@@ -292,16 +292,16 @@ postgres=# SELECT location, total_sales FROM salesinfo_hivercprofile;
 The PXF `Hive` profile supports both non-partitioned and partitioned Hive tables that use the Parquet storage format in HDFS. Simply map the table columns using equivalent HAWQ data types. For example, if a Hive table is created using:
 
 ``` sql
-hive> CREATE TABLE hive_parquet_table (fname string, lname string, custid int, acctbalance double) 
+hive> CREATE TABLE hive_parquet_table (fname string, lname string, custid int, acctbalance double)
         STORED AS parquet;
 ```
 
 Define the HAWQ external table using:
 
 ``` sql
-postgres=# CREATE EXTERNAL TABLE pxf_parquet_table (fname text, lname text, custid int, acctbalance double precision) 
-    LOCATION ('pxf://namenode:51200/hive-db-name.hive_parquet_table?profile=Hive') 
-    FORMAT 'CUSTOM' (formatter='pxfwritable_import'); 
+postgres=# CREATE EXTERNAL TABLE pxf_parquet_table (fname text, lname text, custid int, acctbalance double precision)
+    LOCATION ('pxf://namenode:51200/hive-db-name.hive_parquet_table?profile=Hive')
+    FORMAT 'CUSTOM' (formatter='pxfwritable_import');
 ```
 
 ### <a id="profileperf"></a>Profile Performance Considerations
@@ -312,7 +312,7 @@ The `HiveRC` and `HiveText` profiles are faster than the generic `Hive` profile.
 
 ### <a id="complex_dt_example"></a>Complex Data Types Example
 
-Prepare a sample data set for use. This data set will employ the array and map complex types, specifically an array of integers and a string key/value pair map. 
+Prepare a sample data set for use. This data set will employ the array and map complex types, specifically an array of integers and a string key/value pair map.
 
 The data set includes fields with the following names and data types:
 
@@ -351,10 +351,10 @@ $ HADOOP_USER_NAME=hdfs hive
 ```
 
 ``` sql
-postgres=# CREATE TABLE table_complextypes( index int, name string, intarray ARRAY<int>, propmap MAP<string, string>) 
-             ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
-             COLLECTION ITEMS TERMINATED BY '%' 
-             MAP KEYS TERMINATED BY ':' 
+postgres=# CREATE TABLE table_complextypes( index int, name string, intarray ARRAY<int>, propmap MAP<string, string>)
+             ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
+             COLLECTION ITEMS TERMINATED BY '%'
+             MAP KEYS TERMINATED BY ':'
              STORED AS TEXTFILE;
 ```
 
@@ -387,7 +387,7 @@ Use the PXF `Hive` profile to create a queryable HAWQ external table representin
 
 ``` sql
 postgres=# CREATE EXTERNAL TABLE complextypes_hiveprofile(index int, name text, intarray text, propmap text)
-             LOCATION ('pxf://namenode:51200/table_complextypes?PROFILE=Hive') 
+             LOCATION ('pxf://namenode:51200/table_complextypes?PROFILE=Hive')
              FORMAT 'CUSTOM' (formatter='pxfwritable_import');
 ```
 
@@ -396,11 +396,11 @@ Notice that the integer array and map complex types are mapped to type text.
 Query the external table:
 
 ``` sql
-postgres=# SELECT * FROM complextypes_hiveprofile; 
+postgres=# SELECT * FROM complextypes_hiveprofile;
 ```
 
-``` shell                      
- index |    name    | intarray |              propmap               
+``` shell     
+ index |    name    | intarray |              propmap
 -------+------------+----------+------------------------------------
      3 | Prague     | [1,2,3]  | {"zone":"euro","status":"up"}
     89 | Rome       | [4,5,6]  | {"zone":"euro"}
@@ -421,7 +421,7 @@ postgres=# SELECT * FROM complextypes_hiveprofile;
 
 Hive tables can be queried directly through HCatalog integration with HAWQ and PXF, regardless of the underlying file storage format.
 
-In previous sections, you created an external table in PXF that described the target table's Hive metadata. Another option for querying Hive tables is to take advantage of HAWQ's integration with HCatalog. This integration allows HAWQ to directly use table metadata stored in HCatalog. 
+In previous sections, you created an external table in PXF that described the target table's Hive metadata. Another option for querying Hive tables is to take advantage of HAWQ's integration with HCatalog. This integration allows HAWQ to directly use table metadata stored in HCatalog.
 
 HCatalog is built on top of the Hive metastore and incorporates Hive's DDL. This provides several advantages:
 
@@ -475,14 +475,14 @@ To obtain a description of a Hive table with HCatalog integration, you can use t
     ``` shell
     $ psql -d postgres
     ```
-    
+
     ``` sql
     postgres=# \d hcatalog.default.sales_info_rcfile;
     ```
-    
+
     ``` shell
     PXF Hive Table "default.sales_info_rcfile"
-          Column      |  Type  
+          Column      |  Type
     ------------------+--------
      location         | text
      month            | text
@@ -501,9 +501,9 @@ Alternatively, you can use the `pxf_get_item_fields` user-defined function (UDF)
     ``` sql
     postgres=# SELECT * FROM pxf_get_item_fields('Hive','default.sales_info_rcfile');
     ```
-    
+
     ``` pre
-      path   |     itemname      |    fieldname     | fieldtype 
+      path   |     itemname      |    fieldname     | fieldtype
     ---------+-------------------+------------------+-----------
      default | sales_info_rcfile | location         | text
      default | sales_info_rcfile | month            | text


[13/14] incubator-hawq-docs git commit: remove suspect data types from mapping table

Posted by yo...@apache.org.
remove suspect data types from mapping table


Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/cd50c598
Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/cd50c598
Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/cd50c598

Branch: refs/heads/develop
Commit: cd50c5982912c429b9216c9ea9cf2ac1442bd87a
Parents: 67958d6
Author: Lisa Owen <lo...@pivotal.io>
Authored: Thu Oct 27 14:15:36 2016 -0700
Committer: Lisa Owen <lo...@pivotal.io>
Committed: Thu Oct 27 14:15:36 2016 -0700

----------------------------------------------------------------------
 pxf/HivePXF.html.md.erb | 4 ----
 1 file changed, 4 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/cd50c598/pxf/HivePXF.html.md.erb
----------------------------------------------------------------------
diff --git a/pxf/HivePXF.html.md.erb b/pxf/HivePXF.html.md.erb
index d1c803b..ae68a99 100644
--- a/pxf/HivePXF.html.md.erb
+++ b/pxf/HivePXF.html.md.erb
@@ -50,15 +50,11 @@ The following table summarizes external mapping rules for Hive primitive types.
 | smallint   | int2 |
 | tinyint   | int2 |
 | bigint   | int8 |
-| decimal  |  numeric  |
 | float   | float4 |
 | double   | float8 |
 | string   | text |
 | binary   | bytea |
-| char   | bpchar |
-| varchar   | varchar |
 | timestamp   | timestamp |
-| date   | date |
 
 
 ### <a id="topic_b4v_g3n_25"></a>Complex Data Types


[06/14] incubator-hawq-docs git commit: sql commands all caps

Posted by yo...@apache.org.
sql commands all caps


Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/5cdd8f8c
Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/5cdd8f8c
Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/5cdd8f8c

Branch: refs/heads/develop
Commit: 5cdd8f8c35a51360fe3bfdedeff796bf1e0f31f3
Parents: 150fa67
Author: Lisa Owen <lo...@pivotal.io>
Authored: Tue Oct 11 13:27:17 2016 -0700
Committer: Lisa Owen <lo...@pivotal.io>
Committed: Tue Oct 11 13:27:17 2016 -0700

----------------------------------------------------------------------
 pxf/HivePXF.html.md.erb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/5cdd8f8c/pxf/HivePXF.html.md.erb
----------------------------------------------------------------------
diff --git a/pxf/HivePXF.html.md.erb b/pxf/HivePXF.html.md.erb
index 1cce751..6a7d9af 100644
--- a/pxf/HivePXF.html.md.erb
+++ b/pxf/HivePXF.html.md.erb
@@ -185,7 +185,7 @@ postgres=# CREATE EXTERNAL TABLE salesinfo_hiveprofile(location text, month text
 ```
 
 ``` sql
-postgres=# select * from salesinfo_hiveprofile;
+postgres=# SELECT * FROM salesinfo_hiveprofile;
 ```
 
 ``` shell


[08/14] incubator-hawq-docs git commit: fix some duplicate ids

Posted by yo...@apache.org.
fix some duplicate ids


Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/54b2c01a
Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/54b2c01a
Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/54b2c01a

Branch: refs/heads/develop
Commit: 54b2c01a80d477cc093d7eb1ed2aa8c0bf762d36
Parents: 67e8b96
Author: Lisa Owen <lo...@pivotal.io>
Authored: Fri Oct 21 17:16:24 2016 -0700
Committer: Lisa Owen <lo...@pivotal.io>
Committed: Fri Oct 21 17:16:24 2016 -0700

----------------------------------------------------------------------
 pxf/HivePXF.html.md.erb | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/54b2c01a/pxf/HivePXF.html.md.erb
----------------------------------------------------------------------
diff --git a/pxf/HivePXF.html.md.erb b/pxf/HivePXF.html.md.erb
index 85e85e7..d4b8f01 100644
--- a/pxf/HivePXF.html.md.erb
+++ b/pxf/HivePXF.html.md.erb
@@ -39,7 +39,7 @@ The PXF Hive plug-in supports the following profiles for accessing the Hive file
 - `HiveText`
 - `HiveRC`
 
-## <a id="topic_p2s_lvl_25"></a>Data Type Mapping
+## <a id="topic_p2s_lvl_29"></a>Data Type Mapping
 
 ### <a id="hive_primdatatypes"></a>Primitive Data Types
 
@@ -141,7 +141,7 @@ hive> SELECT * FROM sales_info;
 
 In examples later in this section, you will access the `sales_info` Hive table directly via PXF. You will also insert `sales_info` data into tables of other Hive file format types, and use PXF to access those directly as well.
 
-## <a id="topic_p2s_lvl_25"></a>Querying External Hive Data
+## <a id="topic_p2s_lvl_28"></a>Querying External Hive Data
 
 The PXF Hive plug-in supports several Hive-related profiles. These include `Hive`, `HiveText`, and `HiveRC`.
 


[12/14] incubator-hawq-docs git commit: incorporate kavinder's comments

Posted by yo...@apache.org.
incorporate kavinder's comments

- add new section "Configure Partition Filtering Push-Down
- elaborate on Hive profile user for multiple file format types


Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/67958d62
Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/67958d62
Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/67958d62

Branch: refs/heads/develop
Commit: 67958d628c8a2b46ceeb94904ad5012191f46d72
Parents: 8ee05a3
Author: Lisa Owen <lo...@pivotal.io>
Authored: Thu Oct 27 14:08:22 2016 -0700
Committer: Lisa Owen <lo...@pivotal.io>
Committed: Thu Oct 27 14:08:22 2016 -0700

----------------------------------------------------------------------
 pxf/HivePXF.html.md.erb | 27 ++++++++++++++++++++++++---
 1 file changed, 24 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/67958d62/pxf/HivePXF.html.md.erb
----------------------------------------------------------------------
diff --git a/pxf/HivePXF.html.md.erb b/pxf/HivePXF.html.md.erb
index 7d77bc5..d1c803b 100644
--- a/pxf/HivePXF.html.md.erb
+++ b/pxf/HivePXF.html.md.erb
@@ -176,9 +176,9 @@ Hive-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](..
 
 ## <a id="profile_hive"></a>Hive Profile
 
-The `Hive` profile works with any Hive file format.
+The `Hive` profile works with any Hive file format. It can access heterogenous format data in a single table where each partition may be stored as a different file format.
 
-While you can use the `Hive` profile to access any file format, the more specific profiles perform better for those specific file types.
+While you can use the `Hive` profile to access any file format, the more specific profiles perform better for those single file format types.
 
 
 ### <a id="profile_hive_using"></a>Example: Using the Hive Profile
@@ -211,7 +211,7 @@ Use the `Hive` profile to create a queryable HAWQ external table from the Hive `
 
 ## <a id="profile_hivetext"></a>HiveText Profile
 
-Use the `HiveText` profile to query text formats. The `HiveText` profile is more performant than the `Hive` profile.
+Use the `HiveText` profile to query text format files. The `HiveText` profile is more performant than the `Hive` profile.
 
 **Note**: When using the `HiveText` profile, you *must* specify a delimiter option in *both* the `LOCATION` and `FORMAT` clauses.
 
@@ -325,6 +325,13 @@ postgres=# CREATE EXTERNAL TABLE pxf_parquet_table (fname text, lname text, cust
     FORMAT 'CUSTOM' (formatter='pxfwritable_import');
 ```
 
+And query the HAWQ external table using:
+
+``` sql
+postgres=# SELECT fname,lname FROM pxf_parquet_table;
+```
+
+
 ## <a id="profileperf"></a>Profile Performance Considerations
 
 The `HiveRC` and `HiveText` profiles are faster than the generic `Hive` profile.
@@ -561,6 +568,19 @@ To take advantage of PXF partition filtering push-down, the Hive and PXF partiti
 
 **Note:** The Hive plug-in filters only on partition columns, not on other table attributes.
 
+### <a id="partitionfiltering_pushdowncfg"></a>Configure Partition Filtering Push-Down
+
+PXF partition filtering push-down is enabled by default. To disable PXF partition filtering push-down, set the `pxf_enable_filter_pushdown` HAWQ server configuration parameter to `off`:
+
+``` sql
+postgres=# show pxf_enable_filter_pushdown;
+ pxf_enable_filter_pushdown
+-----------------------------
+ on
+(1 row)
+postgres=# set pxf_enable_filter_pushdown=off;
+```
+
 ### <a id="example2"></a>Create Partitioned Hive Table
 
 Create a�Hive table `sales_part`�with two partition columns, `delivery_state` and `delivery_city:`
@@ -628,6 +648,7 @@ postgres=# SELECT * FROM pxf_sales_part WHERE delivery_city = 'Sacramento' AND i
 The following HAWQ query reads all the data under�`delivery_state` partition `CALIFORNIA`, regardless of the city.
 
 ``` sql
+postgres=# set pxf_enable_filter_pushdown=on;
 postgres=# SELECT * FROM pxf_sales_part WHERE delivery_state = 'CALIFORNIA';
 ```
 


[07/14] incubator-hawq-docs git commit: use <> for optional args

Posted by yo...@apache.org.
use <> for optional args


Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/67e8b969
Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/67e8b969
Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/67e8b969

Branch: refs/heads/develop
Commit: 67e8b9699c9eec64d04ce9e6048ffb385f7f3573
Parents: 5cdd8f8
Author: Lisa Owen <lo...@pivotal.io>
Authored: Tue Oct 11 13:33:35 2016 -0700
Committer: Lisa Owen <lo...@pivotal.io>
Committed: Tue Oct 11 13:33:35 2016 -0700

----------------------------------------------------------------------
 pxf/HivePXF.html.md.erb | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/67e8b969/pxf/HivePXF.html.md.erb
----------------------------------------------------------------------
diff --git a/pxf/HivePXF.html.md.erb b/pxf/HivePXF.html.md.erb
index 6a7d9af..85e85e7 100644
--- a/pxf/HivePXF.html.md.erb
+++ b/pxf/HivePXF.html.md.erb
@@ -148,9 +148,9 @@ The PXF Hive plug-in supports several Hive-related profiles. These include `Hive
 Use the following syntax to create a HAWQ external table representing Hive data:
 
 ``` sql
-CREATE EXTERNAL TABLE table_name
-    ( column_name data_type [, ...] | LIKE other_table )
-LOCATION ('pxf://host[:port]/hive-db-name.hive-table-name
+CREATE EXTERNAL TABLE <table_name>
+    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
+LOCATION ('pxf://<host>[:<port>]/<hive-db-name>.<hive-table-name>
     ?PROFILE=Hive|HiveText|HiveRC[&DELIMITER=delim'])
 FORMAT 'CUSTOM|TEXT' (formatter='pxfwritable_import' | delimiter='delim')
 ```
@@ -159,9 +159,9 @@ Hive-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](..
 
 | Keyword  | Value |
 |-------|-------------------------------------|
-| host    | The HDFS NameNode. |
-| hive-db-name    | Name of the Hive database. If omitted, defaults to the Hive database named `default`. |
-| hive-table-name    | Name of the Hive table. |
+| \<host\>    | The HDFS NameNode. |
+| \<hive-db-name\>    | Name of the Hive database. If omitted, defaults to the Hive database named `default`. |
+| \<hive-table-name\>    | Name of the Hive table. |
 | PROFILE    | The `PROFILE` keyword must specify one of the values `Hive`, `HiveText`, or `HiveRC`. |
 | DELIMITER    | The `DELIMITER` clause is required for both the `HiveText` and `HiveRC` profiles and identifies the field delimiter used in the Hive data set.  `delim` must be a single ascii character or specified in hexadecimal representation. |
 | FORMAT (`Hive` profile)   | The `FORMAT` clause must specify `CUSTOM`. The `CUSTOM` format supports only the built-in `pxfwritable_import` `formatter`.   |


[04/14] incubator-hawq-docs git commit: include list of hive profile names in file format section

Posted by yo...@apache.org.
include list of hive profile names in file format section


Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/8c986b60
Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/8c986b60
Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/8c986b60

Branch: refs/heads/develop
Commit: 8c986b60b8db3edd77c10f23704cc9174c52a803
Parents: 822d754
Author: Lisa Owen <lo...@pivotal.io>
Authored: Tue Oct 11 11:37:34 2016 -0700
Committer: Lisa Owen <lo...@pivotal.io>
Committed: Tue Oct 11 11:37:34 2016 -0700

----------------------------------------------------------------------
 pxf/HivePXF.html.md.erb | 6 ++++++
 1 file changed, 6 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/8c986b60/pxf/HivePXF.html.md.erb
----------------------------------------------------------------------
diff --git a/pxf/HivePXF.html.md.erb b/pxf/HivePXF.html.md.erb
index e1a7bce..581377f 100644
--- a/pxf/HivePXF.html.md.erb
+++ b/pxf/HivePXF.html.md.erb
@@ -33,6 +33,12 @@ Hive supports several file formats:
 
 Refer to [File Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for detailed information about the file formats supported by Hive.
 
+The PXF Hive plug-in supports the following profiles for accessing the Hive file formats listed above. These include:
+
+- `Hive`
+- `HiveText`
+- `HiveRC`
+
 ## <a id="topic_p2s_lvl_25"></a>Data Type Mapping
 
 ### <a id="hive_primdatatypes"></a>Primitive Data Types


[09/14] incubator-hawq-docs git commit: restructure to use numbered steps

Posted by yo...@apache.org.
restructure to use numbered steps


Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/284c3ec2
Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/284c3ec2
Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/284c3ec2

Branch: refs/heads/develop
Commit: 284c3ec2db38e8d9020826e3bf292efad76c1819
Parents: 54b2c01
Author: Lisa Owen <lo...@pivotal.io>
Authored: Wed Oct 26 08:38:37 2016 -0700
Committer: Lisa Owen <lo...@pivotal.io>
Committed: Wed Oct 26 08:38:37 2016 -0700

----------------------------------------------------------------------
 pxf/HivePXF.html.md.erb | 428 ++++++++++++++++++++++---------------------
 1 file changed, 222 insertions(+), 206 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/284c3ec2/pxf/HivePXF.html.md.erb
----------------------------------------------------------------------
diff --git a/pxf/HivePXF.html.md.erb b/pxf/HivePXF.html.md.erb
index d4b8f01..82fcc25 100644
--- a/pxf/HivePXF.html.md.erb
+++ b/pxf/HivePXF.html.md.erb
@@ -2,7 +2,7 @@
 title: Accessing Hive Data
 ---
 
-Apache Hive is a distributed data warehousing infrastructure.  Hive facilitates managing large data sets, supporting multiple data formats, including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive plug-in reads data stored in Hive, as well as HDFS or HBase.
+Apache Hive is a distributed data warehousing infrastructure.  Hive facilitates managing large data sets supporting multiple data formats, including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive plug-in reads data stored in Hive, as well as HDFS or HBase.
 
 This section describes how to use PXF to access Hive data. Options for querying data stored in Hive include:
 
@@ -13,7 +13,7 @@ This section describes how to use PXF to access Hive data. Options for querying
 
 Before accessing Hive data with HAWQ and PXF, ensure that:
 
--   The PXF HDFS plug-in is installed on all cluster nodes.
+-   The PXF HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
 -   The PXF Hive plug-in is installed on all cluster nodes.
 -   The Hive JAR files and conf directory�are installed on all cluster nodes.
 -   You have tested PXF on HDFS.
@@ -69,7 +69,7 @@ The following table summarizes external mapping rules for Hive primitive types.
 
 Hive supports complex data types including array, struct, map, and union. PXF maps each of these complex types to `text`.  While HAWQ does not natively support these types, you can create HAWQ functions or application code to extract subcomponents of these complex data types.
 
-A complex data type example is provided later in this topic.
+An example using complex data types is provided later in this topic.
 
 
 ## <a id="hive_sampledataset"></a>Sample Data Set
@@ -81,27 +81,30 @@ Examples used in this topic will operate on a common data set. This simple data
 - number\_of\_orders - integer
 - total\_sales - double
 
-Prepare the sample data set for use. First, create a text file:
+Prepare the sample data set for use:
 
-```
-$ vi /tmp/pxf_hive_datafile.txt
-```
+1. First, create a text file:
+
+    ```
+    $ vi /tmp/pxf_hive_datafile.txt
+    ```
 
-Add the following data to `pxf_hive_datafile.txt`, employing a comma `,` to separate the field values:
+2. Add the following data to `pxf_hive_datafile.txt`; notice the use of the comma `,` to separate the four field values:
 
-```
-Prague,Jan,101,4875.33
-Rome,Mar,87,1557.39
-Bangalore,May,317,8936.99
-Beijing,Jul,411,11600.67
-San Francisco,Sept,156,6846.34
-Paris,Nov,159,7134.56
-San Francisco,Jan,113,5397.89
-Prague,Dec,333,9894.77
-Bangalore,Jul,271,8320.55
-Beijing,Dec,100,4248.41
-```
-Make note of the path to this file; you will use it in later exercises.
+    ```
+    Prague,Jan,101,4875.33
+    Rome,Mar,87,1557.39
+    Bangalore,May,317,8936.99
+    Beijing,Jul,411,11600.67
+    San Francisco,Sept,156,6846.34
+    Paris,Nov,159,7134.56
+    San Francisco,Jan,113,5397.89
+    Prague,Dec,333,9894.77
+    Bangalore,Jul,271,8320.55
+    Beijing,Dec,100,4248.41
+    ```
+
+Make note of the path to `pxf_hive_datafile.txt`; you will use it in later exercises.
 
 
 ## <a id="hivecommandline"></a>Hive Command Line
@@ -112,32 +115,38 @@ The Hive command line is a subsystem similar to that of `psql`. To start the Hiv
 $ HADOOP_USER_NAME=hdfs hive
 ```
 
-The default Hive database is named `default`. Create a Hive table named `sales_info` to expose our sample data set. Create this table in the `default` database:
+The default Hive database is named `default`. 
 
-``` sql
-hive> CREATE TABLE sales_info (location string, month string,
-        number_of_orders int, total_sales double)
-        ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
-        STORED AS textfile;
-```
+### <a id="hivecommandline_createdb"></a>Example: Create a Hive Database
 
-Notice:
+Create a Hive table to expose our sample data set.
 
-- The `STORED AS textfile` subclause instructs Hive to create the table in Textfile (the default) format.  Hive Textfile format supports comma-, tab-, and space-separated values, as well as data specified in JSON notation.
-- The `DELIMITED FIELDS TERMINATED BY` subclause identifies the field delimiter within a data record (line). The `sales_info` table field delimiter is a comma (`,`).
+1. Create a Hive table named `sales_info` in the `default` database:
 
-Load the `pxf_hive_datafile.txt` sample data file into the `sales_info` table you just created:
+    ``` sql
+    hive> CREATE TABLE sales_info (location string, month string,
+            number_of_orders int, total_sales double)
+            ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
+            STORED AS textfile;
+    ```
 
-``` sql
-hive> LOAD DATA local INPATH '/tmp/pxf_hive_datafile.txt'
-        INTO TABLE sales_info;
-```
+    Notice:
 
-Perform a query on `sales_info` to verify the data was loaded successfully:
+    - The `STORED AS textfile` subclause instructs Hive to create the table in Textfile (the default) format.  Hive Textfile format supports comma-, tab-, and space-separated values, as well as data specified in JSON notation.
+    - The `DELIMITED FIELDS TERMINATED BY` subclause identifies the field delimiter within a data record (line). The `sales_info` table field delimiter is a comma (`,`).
 
-``` sql
-hive> SELECT * FROM sales_info;
-```
+2. Load the `pxf_hive_datafile.txt` sample data file into the `sales_info` table you just created:
+
+    ``` sql
+    hive> LOAD DATA local INPATH '/tmp/pxf_hive_datafile.txt'
+            INTO TABLE sales_info;
+    ```
+
+3. Perform a query on `sales_info` to verify the data was loaded successfully:
+
+    ``` sql
+    hive> SELECT * FROM sales_info;
+    ```
 
 In examples later in this section, you will access the `sales_info` Hive table directly via PXF. You will also insert `sales_info` data into tables of other Hive file format types, and use PXF to access those directly as well.
 
@@ -151,149 +160,156 @@ Use the following syntax to create a HAWQ external table representing Hive data:
 CREATE EXTERNAL TABLE <table_name>
     ( <column_name> <data_type> [, ...] | LIKE <other_table> )
 LOCATION ('pxf://<host>[:<port>]/<hive-db-name>.<hive-table-name>
-    ?PROFILE=Hive|HiveText|HiveRC[&DELIMITER=delim'])
-FORMAT 'CUSTOM|TEXT' (formatter='pxfwritable_import' | delimiter='delim')
+    ?PROFILE=Hive|HiveText|HiveRC[&DELIMITER=<delim>'])
+FORMAT 'CUSTOM|TEXT' (formatter='pxfwritable_import' | delimiter='<delim>')
 ```
 
 Hive-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described below.
 
 | Keyword  | Value |
 |-------|-------------------------------------|
-| \<host\>    | The HDFS NameNode. |
+| \<host\>[:<port\>]    | The HDFS NameNode and port. |
 | \<hive-db-name\>    | Name of the Hive database. If omitted, defaults to the Hive database named `default`. |
 | \<hive-table-name\>    | Name of the Hive table. |
 | PROFILE    | The `PROFILE` keyword must specify one of the values `Hive`, `HiveText`, or `HiveRC`. |
-| DELIMITER    | The `DELIMITER` clause is required for both the `HiveText` and `HiveRC` profiles and identifies the field delimiter used in the Hive data set.  `delim` must be a single ascii character or specified in hexadecimal representation. |
+| DELIMITER    | The `DELIMITER` clause is required for both the `HiveText` and `HiveRC` profiles and identifies the field delimiter used in the Hive data set.  \<delim\> must be a single ascii character or specified in hexadecimal representation. |
 | FORMAT (`Hive` profile)   | The `FORMAT` clause must specify `CUSTOM`. The `CUSTOM` format supports only the built-in `pxfwritable_import` `formatter`.   |
-| FORMAT (`HiveText` and `HiveRC` profiles) | The `FORMAT` clause must specify `TEXT`. The `delimiter` must be specified a second time in `delim`. |
+| FORMAT (`HiveText` and `HiveRC` profiles) | The `FORMAT` clause must specify `TEXT`. The `delimiter` must be specified a second time in '\<delim\>'. |
 
 
-### <a id="profile_hive"></a>Hive Profile
+## <a id="profile_hive"></a>Hive Profile
 
-The `Hive` profile works with any Hive file format table.
+The `Hive` profile works with any Hive file format.
 
-The following SQL call uses the PXF `Hive` profile to create a queryable HAWQ external table from the Hive `sales_info` textfile format table created earlier:
+### <a id="profile_hive_using"></a>Example: Using the Hive Profile
 
-``` shell
-$ psql -d postgres
-```
+Use the `Hive` profile to create a queryable HAWQ external table from the Hive `sales_info` textfile format table created earlier.
 
-``` sql
-postgres=# CREATE EXTERNAL TABLE salesinfo_hiveprofile(location text, month text, num_orders int, total_sales float8)
-             LOCATION ('pxf://namenode:51200/default.sales_info?PROFILE=Hive')
-             FORMAT 'custom' (formatter='pxfwritable_import');
-```
+1. Create a queryable HAWQ external table from the Hive `sales_info` textfile format table created earlier:
 
-``` sql
-postgres=# SELECT * FROM salesinfo_hiveprofile;
-```
+    ``` sql
+    postgres=# CREATE EXTERNAL TABLE salesinfo_hiveprofile(location text, month text, num_orders int, total_sales float8)
+                LOCATION ('pxf://namenode:51200/default.sales_info?PROFILE=Hive')
+              FORMAT 'custom' (formatter='pxfwritable_import');
+    ```
 
-``` shell
-   location    | month | num_orders | total_sales
----------------+-------+------------+-------------
- Prague        | Jan   |        101 |     4875.33
- Rome          | Mar   |         87 |     1557.39
- Bangalore     | May   |        317 |     8936.99
- ...
+2. Query the table:
 
-```
+    ``` sql
+    postgres=# SELECT * FROM salesinfo_hiveprofile;
+    ```
+
+    ``` shell
+       location    | month | num_orders | total_sales
+    ---------------+-------+------------+-------------
+     Prague        | Jan   |        101 |     4875.33
+     Rome          | Mar   |         87 |     1557.39
+     Bangalore     | May   |        317 |     8936.99
+     ...
 
-### <a id="profile_hivetext"></a>HiveText Profile
+    ```
+
+## <a id="profile_hivetext"></a>HiveText Profile
 
 Use the `HiveText` profile to query text formats. The `HiveText` profile is more performant than the `Hive` profile.
 
 **Note**: When using the `HiveText` profile, you *must* specify a delimiter option in *both* the `LOCATION` and `FORMAT` clauses.
 
-Use the PXF `HiveText` profile to create a queryable HAWQ external table from the Hive `sales_info` textfile format table created earlier:
+### <a id="profile_hivetext_using"></a>Example: Using the HiveText Profile
 
-``` sql
-postgres=# CREATE EXTERNAL TABLE salesinfo_hivetextprofile(location text, month text, num_orders int, total_sales float8)
-             LOCATION ('pxf://namenode:51200/default.sales_info?PROFILE=HiveText&DELIMITER=\x2c')
-             FORMAT 'TEXT' (delimiter=E',');
-```
+Use the PXF `HiveText` profile to create a queryable HAWQ external table from the Hive `sales_info` textfile format table created earlier.
 
-(You can safely ignore the "nonstandard use of escape in a string literal" warning and related messages.)
+1. Create the external table:
 
-Notice:
+    ``` sql
+    postgres=# CREATE EXTERNAL TABLE salesinfo_hivetextprofile(location text, month text, num_orders int, total_sales float8)
+                 LOCATION ('pxf://namenode:51200/default.sales_info?PROFILE=HiveText&DELIMITER=\x2c')
+               FORMAT 'TEXT' (delimiter=E',');
+    ```
 
-- The `LOCATION` subclause `DELIMITER` value is specified in hexadecimal format. `\x` is a prefix that instructs PXF to interpret the following characters as hexadecimal. `2c` is the hex value for the comma character.
-- The `FORMAT` subclause `delimiter` value is specified as the single ascii comma character ','. `E` ???
+    (You can safely ignore the "nonstandard use of escape in a string literal" warning and related messages.)
 
-Query the external table:
+    Notice:
 
-``` sql
-postgres=# SELECT * FROM salesinfo_hivetextprofile where location="Beijing";
-```
+    - The `LOCATION` subclause `DELIMITER` value is specified in hexadecimal format. `\x` is a prefix that instructs PXF to interpret the following characters as hexadecimal. `2c` is the hex value for the comma character.
+    - The `FORMAT` subclause `delimiter` value is specified as the single ascii comma character ','. `E` escapes the character.
 
-``` shell
- location | month | num_orders | total_sales
-----------+-------+------------+-------------
- Beijing  | Jul   |        411 |    11600.67
- Beijing  | Dec   |        100 |     4248.41
-(2 rows)
-```
+2. Query the external table:
 
-### <a id="profile_hiverc"></a>HiveRC Profile
+    ``` sql
+    postgres=# SELECT * FROM salesinfo_hivetextprofile where location="Beijing";
+    ```
 
-#### <a id="profile_hiverc_rcfiletbl"></a>RCFile Format Hive Table
+    ``` shell
+     location | month | num_orders | total_sales
+    ----------+-------+------------+-------------
+     Beijing  | Jul   |        411 |    11600.67
+     Beijing  | Dec   |        100 |     4248.41
+    (2 rows)
+    ```
 
-The RCFile Hive format is used for row columnar formatted data. Create a Hive table with RCFile format:
+## <a id="profile_hiverc"></a>HiveRC Profile
 
-``` shell
-$ HADOOP_USER_NAME=hdfs hive
-```
+The RCFile Hive format is used for row columnar formatted data. The `HiveRC` profile provides access to RCFile data.
 
-``` sql
-hive> CREATE TABLE sales_info_rcfile (location string, month string,
-        number_of_orders int, total_sales double)
-        ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
-        STORED AS rcfile;
-```
+### <a id="profile_hiverc_rcfiletbl_using"></a>Example: Using the HiveRC Profile
 
-Insert the data from the `sales_info` table into `sales_info_rcfile`:
+Use the `HiveRC` profile to query RCFile-formatted data in Hive tables. The `HiveRC` profile is more performant than the `Hive` profile for this file format type.
 
-``` sql
-hive> INSERT INTO TABLE sales_info_rcfile SELECT * FROM sales_info;
-```
+1. Create a Hive table with RCFile format:
+
+    ``` shell
+    $ HADOOP_USER_NAME=hdfs hive
+    ```
 
-A copy of the sample data set is now stored in RCFile format in `sales_info_rcfile`. Perform a Hive query on `sales_info_rcfile` to verify the data was loaded successfully:
+    ``` sql
+    hive> CREATE TABLE sales_info_rcfile (location string, month string,
+            number_of_orders int, total_sales double)
+          ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
+          STORED AS rcfile;
+    ```
 
-``` sql
-hive> SELECT * FROM sales_info_rcfile;
-```
-#### <a id="profile_hiverc_pxfquery"></a>PXF Query
+2. Insert the data from the `sales_info` table into `sales_info_rcfile`:
 
-Use the `HiveRC` profile to query RCFile-formatted data in Hive tables. The `HiveRC` profile is more performant than the `Hive` profile.
+    ``` sql
+    hive> INSERT INTO TABLE sales_info_rcfile SELECT * FROM sales_info;
+    ```
 
-**Note**: When using the `HiveRC` profile, you *must* specify a delimiter option in *both* the `LOCATION` and `FORMAT` clauses.
+    A copy of the sample data set is now stored in RCFile format in `sales_info_rcfile`. 
+    
+3. Perform a Hive query on `sales_info_rcfile` to verify the data was loaded successfully:
 
-Use the PXF `HiveRC` profile to create a queryable HAWQ external table from the Hive `sales_info_rcfile` table created in the previous section:
+    ``` sql
+    hive> SELECT * FROM sales_info_rcfile;
+    ```
 
-``` sql
-postgres=# CREATE EXTERNAL TABLE salesinfo_hivercprofile(location text, month text, num_orders int, total_sales float8)
-             LOCATION ('pxf://namenode:51200/default.sales_info_rcfile?PROFILE=HiveRC&DELIMITER=\x2c')
-             FORMAT 'TEXT' (delimiter=E',');
-```
+4. Use the PXF `HiveRC` profile to create a queryable HAWQ external table from the Hive `sales_info_rcfile` table created in the previous step. When using the `HiveRC` profile, you **must** specify a delimiter option in *both* the `LOCATION` and `FORMAT` clauses.:
 
-(Again, you can safely ignore the "nonstandard use of escape in a string literal" warning and related messages.)
+    ``` sql
+    postgres=# CREATE EXTERNAL TABLE salesinfo_hivercprofile(location text, month text, num_orders int, total_sales float8)
+                 LOCATION ('pxf://namenode:51200/default.sales_info_rcfile?PROFILE=HiveRC&DELIMITER=\x2c')
+               FORMAT 'TEXT' (delimiter=E',');
+    ```
 
-Query the external table:
+    (Again, you can safely ignore the "nonstandard use of escape in a string literal" warning and related messages.)
 
-``` sql
-postgres=# SELECT location, total_sales FROM salesinfo_hivercprofile;
-```
+5. Query the external table:
 
-``` shell
-   location    | total_sales
----------------+-------------
- Prague        |     4875.33
- Rome          |     1557.39
- Bangalore     |     8936.99
- Beijing       |    11600.67
- ...
-```
+    ``` sql
+    postgres=# SELECT location, total_sales FROM salesinfo_hivercprofile;
+    ```
 
-### <a id="topic_dbb_nz3_ts"></a>Accessing Parquet-Format Hive Tables
+    ``` shell
+       location    | total_sales
+    ---------------+-------------
+     Prague        |     4875.33
+     Rome          |     1557.39
+     Bangalore     |     8936.99
+     Beijing       |    11600.67
+     ...
+    ```
+
+## <a id="topic_dbb_nz3_ts"></a>Accessing Parquet-Format Hive Tables
 
 The PXF `Hive` profile supports both non-partitioned and partitioned Hive tables that use the Parquet storage format in HDFS. Simply map the table columns using equivalent HAWQ data types. For example, if a Hive table is created using:
 
@@ -310,17 +326,17 @@ postgres=# CREATE EXTERNAL TABLE pxf_parquet_table (fname text, lname text, cust
     FORMAT 'CUSTOM' (formatter='pxfwritable_import');
 ```
 
-### <a id="profileperf"></a>Profile Performance Considerations
+## <a id="profileperf"></a>Profile Performance Considerations
 
 The `HiveRC` and `HiveText` profiles are faster than the generic `Hive` profile.
 
 ?? MORE HERE. ??
 
-### <a id="complex_dt_example"></a>Complex Data Types Example
+## <a id="complex_dt_example"></a>Complex Data Type Example
 
-Prepare a sample data set for use. This data set will employ the array and map complex types, specifically an array of integers and a string key/value pair map.
+This example will employ the array and map complex types, specifically an array of integers and a string key/value pair map.
 
-The data set includes fields with the following names and data types:
+The example data set includes fields with the following names and data types:
 
 - index - int
 - name - string
@@ -329,99 +345,99 @@ The data set includes fields with the following names and data types:
 
 When specifying an array field in a Hive table, you must identify the terminator for each item in the collection. Similarly, the map key termination character must also be specified.
 
-Create a text file from which you will load the data set:
+1. Create a text file from which you will load the data set:
 
-```
-$ vi /tmp/pxf_hive_complex.txt
-```
+    ```
+    $ vi /tmp/pxf_hive_complex.txt
+    ```
 
-Add the following data to `pxf_hive_complex.txt`.  The data uses a comma `,` to separate field values, the percent symbol `%` to separate collection items, and a `:` to terminate map key values:
+2. Add the following data to `pxf_hive_complex.txt`.  The data uses a comma `,` to separate field values, the percent symbol `%` to separate collection items, and a `:` to terminate map key values:
 
-```
-3,Prague,1%2%3,zone:euro%status:up
-89,Rome,4%5%6,zone:euro
-400,Bangalore,7%8%9,zone:apac%status:pending
-183,Beijing,0%1%2,zone:apac
-94,Sacramento,3%4%5,zone:noam%status:down
-101,Paris,6%7%8,zone:euro%status:up
-56,Frankfurt,9%0%1,zone:euro
-202,Jakarta,2%3%4,zone:apac%status:up
-313,Sydney,5%6%7,zone:apac%status:pending
-76,Atlanta,8%9%0,zone:noam%status:down
-```
+    ```
+    3,Prague,1%2%3,zone:euro%status:up
+    89,Rome,4%5%6,zone:euro
+    400,Bangalore,7%8%9,zone:apac%status:pending
+    183,Beijing,0%1%2,zone:apac
+    94,Sacramento,3%4%5,zone:noam%status:down
+    101,Paris,6%7%8,zone:euro%status:up
+    56,Frankfurt,9%0%1,zone:euro
+    202,Jakarta,2%3%4,zone:apac%status:up
+    313,Sydney,5%6%7,zone:apac%status:pending
+    76,Atlanta,8%9%0,zone:noam%status:down
+    ```
 
-Create a Hive table to represent this data:
+3. Create a Hive table to represent this data:
 
-``` shell
-$ HADOOP_USER_NAME=hdfs hive
-```
+    ``` shell
+    $ HADOOP_USER_NAME=hdfs hive
+    ```
 
-``` sql
-postgres=# CREATE TABLE table_complextypes( index int, name string, intarray ARRAY<int>, propmap MAP<string, string>)
+    ``` sql
+    hive> CREATE TABLE table_complextypes( index int, name string, intarray ARRAY<int>, propmap MAP<string, string>)
              ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
              COLLECTION ITEMS TERMINATED BY '%'
              MAP KEYS TERMINATED BY ':'
              STORED AS TEXTFILE;
-```
+    ```
 
-Notice:
+    Notice:
 
-- `FIELDS TERMINATED BY` identifies a comma as the field terminator.
-- The `COLLECTION ITEMS TERMINATED BY` subclause specifies the percent sign as the collection items (array item, map key/value pair) terminator.
-- `MAP KEYS TERMINATED BY` identifies a colon as the terminator for map keys.
+    - `FIELDS TERMINATED BY` identifies a comma as the field terminator.
+    - The `COLLECTION ITEMS TERMINATED BY` subclause specifies the percent sign as the collection items (array item, map key/value pair) terminator.
+    - `MAP KEYS TERMINATED BY` identifies a colon as the terminator for map keys.
 
-Load the `pxf_hive_complex.txt` sample data file into the `table_complextypes` table you just created:
+4. Load the `pxf_hive_complex.txt` sample data file into the `table_complextypes` table you just created:
 
-``` sql
-hive> LOAD DATA local INPATH '/tmp/pxf_hive_complex.txt' INTO TABLE table_complextypes;
-```
+    ``` sql
+    hive> LOAD DATA local INPATH '/tmp/pxf_hive_complex.txt' INTO TABLE table_complextypes;
+    ```
 
-Perform a query on `table_complextypes` to verify the data was loaded successfully:
+5. Perform a query on Hive table `table_complextypes` to verify the data was loaded successfully:
 
-``` sql
-hive> SELECT * FROM table_complextypes;
-```
+    ``` sql
+    hive> SELECT * FROM table_complextypes;
+    ```
 
-``` shell
-3	Prague	[1,2,3]	{"zone":"euro","status":"up"}
-89	Rome	[4,5,6]	{"zone":"euro"}
-400	Bangalore	[7,8,9]	{"zone":"apac","status":"pending"}
-...
-```
+    ``` shell
+    3	Prague	[1,2,3]	{"zone":"euro","status":"up"}
+    89	Rome	[4,5,6]	{"zone":"euro"}
+    400	Bangalore	[7,8,9]	{"zone":"apac","status":"pending"}
+    ...
+    ```
 
-Use the PXF `Hive` profile to create a queryable HAWQ external table representing the Hive `table_complextypes`:
+6. Use the PXF `Hive` profile to create a queryable HAWQ external table representing the Hive `table_complextypes`:
 
-``` sql
-postgres=# CREATE EXTERNAL TABLE complextypes_hiveprofile(index int, name text, intarray text, propmap text)
-             LOCATION ('pxf://namenode:51200/table_complextypes?PROFILE=Hive')
-             FORMAT 'CUSTOM' (formatter='pxfwritable_import');
-```
+    ``` sql
+    postgres=# CREATE EXTERNAL TABLE complextypes_hiveprofile(index int, name text, intarray text, propmap text)
+                 LOCATION ('pxf://namenode:51200/table_complextypes?PROFILE=Hive')
+               FORMAT 'CUSTOM' (formatter='pxfwritable_import');
+    ```
 
-Notice that the integer array and map complex types are mapped to type text.
+    Notice that the integer array and map complex types are mapped to type text.
 
-Query the external table:
+7. Query the external table:
 
-``` sql
-postgres=# SELECT * FROM complextypes_hiveprofile;
-```
+    ``` sql
+    postgres=# SELECT * FROM complextypes_hiveprofile;
+    ```
 
-``` shell     
- index |    name    | intarray |              propmap
--------+------------+----------+------------------------------------
-     3 | Prague     | [1,2,3]  | {"zone":"euro","status":"up"}
-    89 | Rome       | [4,5,6]  | {"zone":"euro"}
-   400 | Bangalore  | [7,8,9]  | {"zone":"apac","status":"pending"}
-   183 | Beijing    | [0,1,2]  | {"zone":"apac"}
-    94 | Sacramento | [3,4,5]  | {"zone":"noam","status":"down"}
-   101 | Paris      | [6,7,8]  | {"zone":"euro","status":"up"}
-    56 | Frankfurt  | [9,0,1]  | {"zone":"euro"}
-   202 | Jakarta    | [2,3,4]  | {"zone":"apac","status":"up"}
-   313 | Sydney     | [5,6,7]  | {"zone":"apac","status":"pending"}
-    76 | Atlanta    | [8,9,0]  | {"zone":"noam","status":"down"}
-(10 rows)
-```
+    ``` shell     
+     index |    name    | intarray |              propmap
+    -------+------------+----------+------------------------------------
+         3 | Prague     | [1,2,3]  | {"zone":"euro","status":"up"}
+        89 | Rome       | [4,5,6]  | {"zone":"euro"}
+       400 | Bangalore  | [7,8,9]  | {"zone":"apac","status":"pending"}
+       183 | Beijing    | [0,1,2]  | {"zone":"apac"}
+        94 | Sacramento | [3,4,5]  | {"zone":"noam","status":"down"}
+       101 | Paris      | [6,7,8]  | {"zone":"euro","status":"up"}
+        56 | Frankfurt  | [9,0,1]  | {"zone":"euro"}
+       202 | Jakarta    | [2,3,4]  | {"zone":"apac","status":"up"}
+       313 | Sydney     | [5,6,7]  | {"zone":"apac","status":"pending"}
+        76 | Atlanta    | [8,9,0]  | {"zone":"noam","status":"down"}
+    (10 rows)
+    ```
 
-`intarray` and `propmap` are each text strings.
+    `intarray` and `propmap` are each text strings.
 
 ## <a id="hcatalog"></a>Using PXF and HCatalog to Query Hive
 


[14/14] incubator-hawq-docs git commit: Merge branch 'feature/pxfhive-enhance' of https://github.com/lisakowen/incubator-hawq-docs into develop

Posted by yo...@apache.org.
Merge branch 'feature/pxfhive-enhance' of https://github.com/lisakowen/incubator-hawq-docs into develop


Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/bf5b6d0d
Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/bf5b6d0d
Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/bf5b6d0d

Branch: refs/heads/develop
Commit: bf5b6d0dfbd78607014419e0a81d9b51e9e12a47
Parents: 01f3f8e cd50c59
Author: David Yozie <yo...@apache.org>
Authored: Thu Oct 27 16:32:40 2016 -0700
Committer: David Yozie <yo...@apache.org>
Committed: Thu Oct 27 16:32:40 2016 -0700

----------------------------------------------------------------------
 pxf/HivePXF.html.md.erb | 739 +++++++++++++++++++++++++++++--------------
 1 file changed, 503 insertions(+), 236 deletions(-)
----------------------------------------------------------------------



[11/14] incubator-hawq-docs git commit: address david's comments

Posted by yo...@apache.org.
address david's comments


Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/8ee05a3d
Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/8ee05a3d
Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/8ee05a3d

Branch: refs/heads/develop
Commit: 8ee05a3d29d2b74c107cf200352ebd5b16963604
Parents: 2a38a03
Author: Lisa Owen <lo...@pivotal.io>
Authored: Thu Oct 27 12:08:05 2016 -0700
Committer: Lisa Owen <lo...@pivotal.io>
Committed: Thu Oct 27 12:08:05 2016 -0700

----------------------------------------------------------------------
 pxf/HivePXF.html.md.erb | 84 ++++++++++++++++++++++++--------------------
 1 file changed, 46 insertions(+), 38 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/8ee05a3d/pxf/HivePXF.html.md.erb
----------------------------------------------------------------------
diff --git a/pxf/HivePXF.html.md.erb b/pxf/HivePXF.html.md.erb
index da851db..7d77bc5 100644
--- a/pxf/HivePXF.html.md.erb
+++ b/pxf/HivePXF.html.md.erb
@@ -22,23 +22,19 @@ Before accessing Hive data with HAWQ and PXF, ensure that:
 
 ## <a id="topic_p2s_lvl_25"></a>Hive File Formats
 
-Hive supports several file formats:
+The PXF Hive plug-in supports several file formats and profiles for accessing these formats:
 
--   TextFile - flat file with data in comma-, tab-, or space-separated value format or JSON notation
--   SequenceFile - flat file consisting of binary key/value pairs
--   RCFile - record columnar data consisting of binary key/value pairs; high row compression rate
--   ORCFile - optimized row columnar data with stripe, footer, and postscript sections; reduces data size
--   Parquet - compressed columnar data representation
--   Avro - JSON-defined, schema-based data serialization format
+| File Format  | Description | Profile |
+|-------|---------------------------|-------|
+| TextFile | Flat file with data in comma-, tab-, or space-separated value format or JSON notation. | Hive, HiveText |
+| SequenceFile | Flat file consisting of binary key/value pairs. | Hive |
+| RCFile | Record columnar data consisting of binary key/value pairs; high row compression rate. | Hive, HiveRC |
+| ORCFile | Optimized row columnar data with stripe, footer, and postscript sections; reduces data size. | Hive |
+| Parquet | Compressed columnar data representation. | Hive |
+| Avro | JSON-defined, schema-based data serialization format. | Hive |
 
 Refer to [File Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for detailed information about the file formats supported by Hive.
 
-The PXF Hive plug-in supports the following profiles for accessing the Hive file formats listed above. These include:
-
-- `Hive`
-- `HiveText`
-- `HiveRC`
-
 ## <a id="topic_p2s_lvl_29"></a>Data Type Mapping
 
 ### <a id="hive_primdatatypes"></a>Primitive Data Types
@@ -76,10 +72,12 @@ An example using complex data types is provided later in this topic.
 
 Examples used in this topic will operate on a common data set. This simple data set models a retail sales operation and includes fields with the following names and data types:
 
-- location - text
-- month - text
-- number\_of\_orders - integer
-- total\_sales - double
+| Field Name  | Data Type |
+|-------|---------------------------|
+| location | text |
+| month | text |
+| number\_of\_orders | integer |
+| total\_sales | double |
 
 Prepare the sample data set for use:
 
@@ -131,7 +129,6 @@ Create a Hive table to expose our sample data set.
     ```
 
     Notice that:
-
     - The `STORED AS textfile` subclause instructs Hive to create the table in Textfile (the default) format.  Hive Textfile format supports comma-, tab-, and space-separated values, as well as data specified in JSON notation.
     - The `DELIMITED FIELDS TERMINATED BY` subclause identifies the field delimiter within a data record (line). The `sales_info` table field delimiter is a comma (`,`).
 
@@ -142,7 +139,7 @@ Create a Hive table to expose our sample data set.
             INTO TABLE sales_info;
     ```
 
-3. Perform a query on `sales_info` to verify the data was loaded successfully:
+3. Perform a query on `sales_info` to verify that the data was loaded successfully:
 
     ``` sql
     hive> SELECT * FROM sales_info;
@@ -169,8 +166,8 @@ Hive-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](..
 | Keyword  | Value |
 |-------|-------------------------------------|
 | \<host\>[:<port\>]    | The HDFS NameNode and port. |
-| \<hive-db-name\>    | Name of the Hive database. If omitted, defaults to the Hive database named `default`. |
-| \<hive-table-name\>    | Name of the Hive table. |
+| \<hive-db-name\>    | The name of the Hive database. If omitted, defaults to the Hive database named `default`. |
+| \<hive-table-name\>    | The name of the Hive table. |
 | PROFILE    | The `PROFILE` keyword must specify one of the values `Hive`, `HiveText`, or `HiveRC`. |
 | DELIMITER    | The `DELIMITER` clause is required for both the `HiveText` and `HiveRC` profiles and identifies the field delimiter used in the Hive data set.  \<delim\> must be a single ascii character or specified in hexadecimal representation. |
 | FORMAT (`Hive` profile)   | The `FORMAT` clause must specify `CUSTOM`. The `CUSTOM` format supports only the built-in `pxfwritable_import` `formatter`.   |
@@ -181,6 +178,9 @@ Hive-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](..
 
 The `Hive` profile works with any Hive file format.
 
+While you can use the `Hive` profile to access any file format, the more specific profiles perform better for those specific file types.
+
+
 ### <a id="profile_hive_using"></a>Example: Using the Hive Profile
 
 Use the `Hive` profile to create a queryable HAWQ external table from the Hive `sales_info` textfile format table created earlier.
@@ -230,9 +230,8 @@ Use the PXF `HiveText` profile to create a queryable HAWQ external table from th
     (You can safely ignore the "nonstandard use of escape in a string literal" warning and related messages.)
 
     Notice that:
-
     - The `LOCATION` subclause `DELIMITER` value is specified in hexadecimal format. `\x` is a prefix that instructs PXF to interpret the following characters as hexadecimal. `2c` is the hex value for the comma character.
-    - The `FORMAT` subclause `delimiter` value is specified as the single ascii comma character ','. `E` escapes the character.
+    - The `FORMAT` subclause `delimiter` value is specified as the single ascii comma character `','`. `E` escapes the character.
 
 2. Query the external table:
 
@@ -277,7 +276,7 @@ Use the `HiveRC` profile to query RCFile-formatted data in Hive tables. The `Hiv
 
     A copy of the sample data set is now stored in RCFile format in `sales_info_rcfile`. 
     
-3. Perform a Hive query on `sales_info_rcfile` to verify the data was loaded successfully:
+3. Perform a Hive query on `sales_info_rcfile` to verify that the data was loaded successfully:
 
     ``` sql
     hive> SELECT * FROM sales_info_rcfile;
@@ -330,18 +329,19 @@ postgres=# CREATE EXTERNAL TABLE pxf_parquet_table (fname text, lname text, cust
 
 The `HiveRC` and `HiveText` profiles are faster than the generic `Hive` profile.
 
-?? MORE HERE. ??
 
 ## <a id="complex_dt_example"></a>Complex Data Type Example
 
 This example will employ the array and map complex types, specifically an array of integers and a string key/value pair map.
 
-The example data set includes fields with the following names and data types:
+The data schema for this example includes fields with the following names and data types:
 
-- index - int
-- name - string
-- intarray - array of integers
-- propmap - map of string key and value pairs
+| Field Name  | Data Type |
+|-------|---------------------------|
+| index | int |
+| name | string
+| intarray | array of integers |
+| propmap | map of string key and value pairs |
 
 When specifying an array field in a Hive table, you must identify the terminator for each item in the collection. Similarly, the map key termination character must also be specified.
 
@@ -381,7 +381,6 @@ When specifying an array field in a Hive table, you must identify the terminator
     ```
 
     Notice that:
-
     - `FIELDS TERMINATED BY` identifies a comma as the field terminator.
     - The `COLLECTION ITEMS TERMINATED BY` subclause specifies the percent sign as the collection items (array item, map key/value pair) terminator.
     - `MAP KEYS TERMINATED BY` identifies a colon as the terminator for map keys.
@@ -392,7 +391,7 @@ When specifying an array field in a Hive table, you must identify the terminator
     hive> LOAD DATA local INPATH '/tmp/pxf_hive_complex.txt' INTO TABLE table_complextypes;
     ```
 
-5. Perform a query on Hive table `table_complextypes` to verify the data was loaded successfully:
+5. Perform a query on Hive table `table_complextypes` to verify that the data was loaded successfully:
 
     ``` sql
     hive> SELECT * FROM table_complextypes;
@@ -460,7 +459,7 @@ The following diagram depicts how HAWQ integrates with HCatalog to query Hive ta
 3.  PXF queries Hive using table metadata that is stored in the HAWQ in-memory catalog tables. Table metadata is dropped at the end of the transaction.
 
 
-### <a id="topic_j1l_y55_c5"></a>Usage
+### <a id="topic_j1l_enabling"></a>Enabling HCatalog Integration
 
 To enable HCatalog query integration in HAWQ, perform the following steps:
 
@@ -476,7 +475,8 @@ To enable HCatalog query integration in HAWQ, perform the following steps:
     ``` sql
     postgres=# GRANT ALL ON PROTOCOL pxf TO "role";
     ``` 
-    
+
+### <a id="topic_j1l_y55_c5"></a>Usage    
 
 To query a Hive table with HCatalog integration, query HCatalog directly from HAWQ. The query syntax is:
 
@@ -574,10 +574,18 @@ hive> CREATE TABLE sales_part (name string, type string, supplier_key int, price
 Load data into this Hive table and�add some partitions:
 
 ``` sql
-hive> INSERT INTO TABLE sales_part PARTITION(delivery_state = 'CALIFORNIA', delivery_city = 'Fresno') VALUES ('block', 'widget', 33, 15.17);
-hive> INSERT INTO TABLE sales_part PARTITION(delivery_state = 'CALIFORNIA', delivery_city = 'Sacramento') VALUES ('cube', 'widget', 11, 1.17);
-hive> INSERT INTO TABLE sales_part PARTITION(delivery_state = 'NEVADA', delivery_city = 'Reno') VALUES ('dowel', 'widget', 51, 31.82);
-hive> INSERT INTO TABLE sales_part PARTITION(delivery_state = 'NEVADA', delivery_city = 'Las Vegas') VALUES ('px49', 'pipe', 52, 99.82);
+hive> INSERT INTO TABLE sales_part 
+        PARTITION(delivery_state = 'CALIFORNIA', delivery_city = 'Fresno') 
+        VALUES ('block', 'widget', 33, 15.17);
+hive> INSERT INTO TABLE sales_part 
+        PARTITION(delivery_state = 'CALIFORNIA', delivery_city = 'Sacramento') 
+        VALUES ('cube', 'widget', 11, 1.17);
+hive> INSERT INTO TABLE sales_part 
+        PARTITION(delivery_state = 'NEVADA', delivery_city = 'Reno') 
+        VALUES ('dowel', 'widget', 51, 31.82);
+hive> INSERT INTO TABLE sales_part 
+        PARTITION(delivery_state = 'NEVADA', delivery_city = 'Las Vegas') 
+        VALUES ('px49', 'pipe', 52, 99.82);
 ```
 
 The Hive storage directory structure for the `sales_part` table appears as follows: