You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hawq.apache.org by lisakowen <gi...@git.apache.org> on 2016/10/28 22:26:09 UTC

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

GitHub user lisakowen opened a pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46

    HAWQ-1119 - create doc content for PXF writable profiles

    created a new section for PXF writable profiles (HDFS plug in HdfsTextSimple and SequenceWritable).  included examples and discussions.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/lisakowen/incubator-hawq-docs feature/pxfhdfs-writable

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-hawq-docs/pull/46.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #46
    
----
commit f2304ce06b0529177efee1912c6c3c3b9aaf5b1f
Author: Lisa Owen <lo...@pivotal.io>
Date:   2016-10-25T20:14:42Z

    add file for PXF HDFS writable profile topic

commit 80dc3dbe33397ef872265afe50551957cd773bef
Author: Lisa Owen <lo...@pivotal.io>
Date:   2016-10-26T19:32:04Z

    adding more content

commit fddb5b8817fdf3a800eb7099d9417bd05735abab
Author: Lisa Owen <lo...@pivotal.io>
Date:   2016-10-28T21:58:01Z

    flesh out sequencewritable profile section

commit a88c167ead7135a6f43d11cb6c4921fe680e60b9
Author: Lisa Owen <lo...@pivotal.io>
Date:   2016-10-28T22:00:17Z

    change section title

commit a3cbdcf804d936c512f99e1b15905328f4f835f1
Author: Lisa Owen <lo...@pivotal.io>
Date:   2016-10-28T22:13:54Z

    add link to writing to HDFS in pxf overview page

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by dyozie <gi...@git.apache.org>.
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r85814629
  
    --- Diff: pxf/HDFSWritablePXF.html.md.erb ---
    @@ -0,0 +1,410 @@
    +---
    +title: Writing Data to HDFS
    +---
    +
    +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable table to export data from a HAWQ internal table to HDFS.
    +
    +This section describes how to use these PXF profiles to create writable external tables.
    +
    +**Note**: You cannot directly query data in a HAWQ writable table.  After creating the external writable table, you must create a HAWQ readable external table accessing the HDFS file, then query that table. ??You can also create a Hive table to access the HDFS file.??
    +
    +## <a id="pxfwrite_prereq"></a>Prerequisites
    +
    +Before working with HDFS file data using HAWQ and PXF, ensure that:
    +
    +-   The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
    +-   All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users.
    +
    +## <a id="hdfsplugin_writeextdata"></a>Writing to PXF External Tables
    +The PXF HDFS plug-in supports writable two profiles: `HdfsTextSimple` and `SequenceWritable`.
    +
    +Use the following syntax to create a HAWQ external writable table representing HDFS data:�
    +
    +``` sql
    +CREATE EXTERNAL WRITABLE TABLE <table_name> 
    +    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
    +LOCATION ('pxf://<host>[:<port>]/<path-to-hdfs-file>
    +    ?PROFILE=HdfsTextSimple|SequenceWritable[&<custom-option>=<value>[...]]')
    +FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
    +```
    +
    +HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the table below.
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| \<host\>[:\<port\>]    | The HDFS NameNode and port. |
    +| \<path-to-hdfs-file\>    | The path to the file in the HDFS data store. |
    +| PROFILE    | The `PROFILE` keyword must specify one of the values `HdfsTextSimple` or `SequenceWritable`. |
    +| \<custom-option\>  | \<custom-option\> is profile-specific. These options are discussed in the next topic.|
    +| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile when \<path-to-hdfs-file\> will reference a plain text delimited file. The `HdfsTextSimple` '`TEXT`' `FORMAT` supports only the built-in `(delimiter=<delim>)` \<formatting-property\>. |
    +| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with `HdfsTextSimple` when \<path-to-hdfs-file\> will reference a comma-separated value file.  |
    +| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the `SequenceWritable` profile. The `SequenceWritable` '`CUSTOM`' `FORMAT` supports only the built-in `(formatter='pxfwritable_export)` (write) and `(formatter='pxfwritable_import)` (read) \<formatting-properties\>.
    +
    +**Note**: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification.
    +
    +## <a id="profile_hdfstextsimple"></a>Custom Options
    +
    +The `HdfsTextSimple` and `SequenceWritable` profiles support the following \<custom-options\>:
    +
    +| Keyword  | Value Description |
    +|-------|-------------------------------------|
    +| COMPRESSION_CODEC    | The compression codec Java class name. If this option is not provided, no data compression is performed. Supported compression codecs include: `org.apache.hadoop.io.compress.DefaultCodec`, `org.apache.hadoop.io.compress.BZip2Codec`, and `org.apache.hadoop.io.compress.GzipCodec` (`HdfsTextSimple` profile only) |
    +| COMPRESSION_TYPE    | The compression type to employ; supported values are `RECORD` (the default) or `BLOCK`. |
    +| DATA-SCHEMA    | (`SequenceWritable` profile only) The name of the writer serialization/deserialization class. The jar file in which this class resides must be in the PXF class path. This option has no default value. |
    +| THREAD-SAFE | Boolean value determining if a table query can run in multi-thread mode. Default value is `TRUE`, requests run in multi-threaded mode. When set to `FALSE`, requests will be handled in a single thread.  `THREAD-SAFE` should be set appropriately when operations that are not thread-safe are performed (i.e. compression). |
    +
    +## <a id="profile_hdfstextsimple"></a>HdfsTextSimple Profile
    +
    +Use the `HdfsTextSimple` profile when writing delimited data to a plain text file where each row is a single record.
    +
    +Writable tables created using the `HdfsTextSimple` profile can use no, record, or block compression. When compression is used, the default, gzip, and bzip2 Hadoop compression codecs are supported:
    +
    +- org.apache.hadoop.io.compress.DefaultCodec
    +- org.apache.hadoop.io.compress.GzipCodec
    +- org.apache.hadoop.io.compress.BZip2Codec
    +
    +\<formatting-properties\> supported by the `HdfsTextSimple` profile include:
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| delimiter    | The delimiter character to use when writing the file. Default value is a comma `,`.|
    +
    +
    +### <a id="profile_hdfstextsimple_writing"></a>Example: Writing Using the HdfsTextSimple Profile
    +
    +This example uses the data schema introduced in [Example: Using the HdfsTextSimple Profile] (HDFSFileDataPXF.html#profile_hdfstextsimple_query):
    +
    +
    +| Field Name  | Data Type |
    +|-------|-------------------------------------|
    +| location | text |
    +| month | text |
    +| number\_of\_orders | int |
    +| total\_sales | float8 |
    +
    +
    +Perform the following operations to use the PXF `HdfsTextSimple` profile to create a HAWQ writable external table with the same data schema as defined above. You will also create a separate external readable table to read the associated HDFS file.
    +
    +1. Create a writable HAWQ external table with the data schema described above. Write to the HDFS file `/data/pxf_examples/pxfwritable_hdfs_textsimple1`. Create the table specifying a comma `,` as the delimiter:
    +
    +    ``` sql
    +    gpadmin=# CREATE WRITABLE EXTERNAL TABLE pxf_hdfs_writabletbl_1(location text, month text, num_orders int, total_sales float8)
    --- End diff --
    
    Global - make sure the syntax is correct.  Either CREATE WRITABLE EXTERNAL TABLE or CREATE EXTERNAL WRITABLE TABLE.  I hope it's the former...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by kavinderd <gi...@git.apache.org>.
Github user kavinderd commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r96090205
  
    --- Diff: markdown/pxf/HDFSWritablePXF.html.md.erb ---
    @@ -0,0 +1,416 @@
    +---
    +title: Writing Data to HDFS
    +---
    +
    +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable table to export data from a HAWQ internal table to binary or text HDFS files.
    +
    +Use the `HdfsTextSimple` profile when writing text data. Use the `SequenceWritable` profile when dealing with binary data.
    +
    +This section describes how to use these PXF profiles to create writable external tables.
    +
    +**Note**: Tables that you create with writable profiles can only be used for INSERT operations.  If you want to query inserted data, you must define a separate external readable table that references the new HDFS file using the equivalent readable profile.  ??You can also create a Hive table to access the HDFS file.??
    +
    +## <a id="pxfwrite_prereq"></a>Prerequisites
    +
    +Before working with HDFS file data using HAWQ and PXF, ensure that:
    +
    +-   The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
    +-   All HDFS users have read permissions to HDFS services.
    +-   HDFS write permissions are provided to a restricted set of users.
    +
    +## <a id="hdfsplugin_writeextdata"></a>Writing to PXF External Tables
    +The PXF HDFS plug-in supports two writable profiles: `HdfsTextSimple` and `SequenceWritable`.
    +
    +Use the following syntax to create a HAWQ external writable table representing HDFS data:�
    +
    +``` sql
    +CREATE WRITABLE EXTERNAL TABLE <table_name> 
    +    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
    +LOCATION ('pxf://<host>[:<port>]/<path-to-hdfs-file>
    +    ?PROFILE=HdfsTextSimple|SequenceWritable[&<custom-option>=<value>[...]]')
    +FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
    +```
    +
    +HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the table below.
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| \<host\>[:\<port\>]    | The HDFS NameNode and port. |
    +| \<path-to-hdfs-file\>    | The path to the file in the HDFS data store. |
    +| PROFILE    | The `PROFILE` keyword must specify one of the values `HdfsTextSimple` or `SequenceWritable`. |
    +| \<custom-option\>  | \<custom-option\> is profile-specific. These options are discussed in the next topic.|
    +| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile to create a plain-text-delimited file at the location specified by \<path-to-hdfs-file\>. The `HdfsTextSimple` '`TEXT`' `FORMAT` supports only the built-in `(delimiter=<delim>)` \<formatting-property\>. |
    +| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with the `HdfsTextSimple` profile to create a comma-separated-value file at the location specified by \<path-to-hdfs-file\>.  |
    +| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the `SequenceWritable` profile. The `SequenceWritable` '`CUSTOM`' `FORMAT` supports only the built-in `(formatter='pxfwritable_export)` (write) and `(formatter='pxfwritable_import)` (read) \<formatting-properties\>.
    +
    +**Note**: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification.
    +
    +## <a id="profile_hdfstextsimple"></a>Custom Options
    +
    +The `HdfsTextSimple` and `SequenceWritable` profiles support the following custom options:
    +
    +| Option  | Value Description | Profile |
    +|-------|-------------------------------------|--------|
    +| COMPRESSION_CODEC    | The compression codec Java class name. If this option is not provided, no data compression is performed. Supported compression codecs include: `org.apache.hadoop.io.compress.DefaultCodec` and `org.apache.hadoop.io.compress.BZip2Codec` | HdfsTextSimple, SequenceWritable |
    +|    |  `org.apache.hadoop.io.compress.GzipCodec` | HdfsTextSimple |
    +| COMPRESSION_TYPE    | The compression type to employ; supported values are `RECORD` (the default) or `BLOCK`. | HdfsTextSimple, SequenceWritable |
    +| DATA-SCHEMA    | The name of the writer serialization/deserialization class. The jar file in which this class resides must be in the PXF classpath. This option is required for the `SequenceWritable` profile and has no default value. | SequenceWritable|
    +| THREAD-SAFE | Boolean value determining if a table query can run in multi-threaded mode. The default value is `TRUE`. Set this option to `FALSE` to handle all requests in a single thread for operations that are not thread-safe (for example, compression). | HdfsTextSimple, SequenceWritable| 
    +
    +## <a id="profile_hdfstextsimple"></a>HdfsTextSimple Profile
    +
    +Use the `HdfsTextSimple` profile when writing delimited data to a plain text file where each row is a single record.
    +
    +Writable tables created using the `HdfsTextSimple` profile can optionally use record or block compression. The following compression codecs are supported:
    +
    +- org.apache.hadoop.io.compress.DefaultCodec
    +- org.apache.hadoop.io.compress.GzipCodec
    +- org.apache.hadoop.io.compress.BZip2Codec
    +
    +The `HdfsTextSimple` profile supports the following \<formatting-properties\>:
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| delimiter    | The delimiter character to use when writing the file. Default value is a comma `,`.|
    +
    +
    +### <a id="profile_hdfstextsimple_writing"></a>Example: Writing Data Using the HdfsTextSimple Profile
    +
    +This example uses the data schema introduced in [Example: Using the HdfsTextSimple Profile] (HDFSFileDataPXF.html#profile_hdfstextsimple_query):
    +
    +
    +| Field Name  | Data Type |
    +|-------|-------------------------------------|
    +| location | text |
    +| month | text |
    +| number\_of\_orders | int |
    +| total\_sales | float8 |
    +
    +This example also uses the HAWQ table `pxf_hdfs_textsimple` created in that exercise and expects it to exist.
    +
    +
    +Perform the following operations to use the PXF `HdfsTextSimple` profile to create a HAWQ writable external table with the same data schema as defined above. You will also create a separate external readable table to read the associated HDFS file.
    +
    +1. Create a writable HAWQ external table with the data schema described above. Write to the HDFS file `/data/pxf_examples/pxfwritable_hdfs_textsimple1`. Create the table specifying a comma `,` as the delimiter:
    +
    +    ``` sql
    +    gpadmin=# CREATE WRITABLE EXTERNAL TABLE pxf_hdfs_writabletbl_1(location text, month text, num_orders int, total_sales float8)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxfwritable_hdfs_textsimple1?PROFILE=HdfsTextSimple')
    +              FORMAT 'TEXT' (delimiter=E',');
    +    ```
    +    
    +    The `FORMAT` subclause `delimiter` value is specified as the single ascii comma character `,`. `E` escapes the character.
    +
    +2. Write a few records to the `pxfwritable_hdfs_textsimple1` HDFS file by invoking the SQL `INSERT` command on `pxf_hdfs_writabletbl_1`:
    +
    +    ``` sql
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_1 VALUES ( 'Frankfurt', 'Mar', 777, 3956.98 );
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_1 VALUES ( 'Cleveland', 'Oct', 3812, 96645.37 );
    +    ```
    +
    +3. Insert the contents of the `pxf_hdfs_textsimple` table created in [Example: Using the HdfsTextSimple Profile] (HDFSFileDataPXF.html#profile_hdfstextsimple_query) into `pxf_hdfs_writabletbl_1`:
    +
    +    ``` sql
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_1 SELECT * FROM pxf_hdfs_textsimple;
    +    ```
    +
    +4. View the file contents in HDFS:
    +
    +    ``` shell
    +    $ hdfs dfs -cat /data/pxf_examples/pxfwritable_hdfs_textsimple1/*
    +    Frankfurt,Mar,777,3956.98
    +    Cleveland,Oct,3812,96645.37
    +    Prague,Jan,101,4875.33
    +    Rome,Mar,87,1557.39
    +    Bangalore,May,317,8936.99
    +    Beijing,Jul,411,11600.67
    +    ```
    +
    +    Because you specified comma `,` as the delimiter, this character is the field separator used in each record of the HDFS file.
    +
    +5. Querying an external writable table is not supported in HAWQ. To query data from the newly-created HDFS file, create a readable external HAWQ table referencing the HDFS file:
    +
    +    ``` sql
    +    gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple_r1(location text, month text, num_orders int, total_sales float8)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxfwritable_hdfs_textsimple1?PROFILE=HdfsTextSimple')
    +			    FORMAT 'CSV';
    +    ```
    +
    +    The table is created with the `'CSV'` `FORMAT` because the delimiter character used when creating the writable table was a comma `,`.
    +
    +6. Query the readable external table `pxf_hdfs_textsimple_r1`:
    +
    +    ``` sql
    +    gpadmin=# SELECT * FROM pxf_hdfs_textsimple_r1;
    +    ```
    +
    +    ``` pre
    +     location  | month | num_orders | total_sales
    +    -----------+-------+------------+-------------
    +      Frankfurt | Mar   |        777 |     3956.98
    +      Cleveland | Oct   |       3812 |    96645.37
    +      Prague    | Jan   |        101 |     4875.33
    +      Rome      | Mar   |         87 |     1557.39
    +      Bangalore | May   |        317 |     8936.99
    +      Beijing   | Jul   |        411 |    11600.67
    +    (6 rows)
    +    ```
    +
    +    The table includes the records you individually inserted, as well as the full contents of the `pxf_hdfs_textsimple` table.
    +
    +7. Create a second HAWQ external writable table, this time using Gzip compression and employing a colon `:` as the delimiter:
    +
    +    ``` sql
    +    gpadmin=# CREATE WRITABLE EXTERNAL TABLE pxf_hdfs_writabletbl_2 (location text, month text, num_orders int, total_sales float8)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxfwritable_hdfs_textsimple2?PROFILE=HdfsTextSimple&COMPRESSION_CODEC=org.apache.hadoop.io.compress.GzipCodec')
    +              FORMAT 'TEXT' (delimiter=E':');
    +    ```
    +
    +8. Write a few records to the `pxfwritable_hdfs_textsimple2` HDFS file by inserting into the `pxf_hdfs_writabletbl_2` table:
    +
    +    ``` sql
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_2 VALUES ( 'Frankfurt', 'Mar', 777, 3956.98 );
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_2 VALUES ( 'Cleveland', 'Oct', 3812, 96645.37 );
    +    ```
    +
    +9. View the file contents in HDFS; use the `-text` option to `hdfs dfs` to view the compressed data as text:
    +
    +    ``` shell
    +    $ hdfs dfs -text /data/pxf_examples/pxfwritable_hdfs_textsimple2/*
    +    Frankfurt:Mar:777:3956.98
    +    Cleveland:Oct:3812:96645.3
    +    ```
    +
    +    Notice that the colon `:` is the field separator in the HDFS file.
    +
    +    As described in Step 5 above, to query data from the newly-created HDFS file named `pxfwritable_hdfs_textsimple2`, you can create a readable external HAWQ table referencing this HDFS file.
    +
    +
    +## <a id="profile_hdfsseqwritable_using"></a>SequenceWritable Profile
    +
    +Use the HDFS plug-in `SequenceWritable` profile when writing SequenceFile format files. Files of this type consist of binary key/value pairs. Sequence files are a common data transfer format between MapReduce jobs. 
    +
    +SequenceFile format files can optionally use record or block compression. The following compression codecs are supported:
    +
    +- org.apache.hadoop.io.compress.DefaultCodec
    +- org.apache.hadoop.io.compress.BZip2Codec
    +
    +When using the `SequenceWritable` profile to write a SequenceFile format file, you must provide the name of the Java class to use for serializing/deserializing the data. This class must provide read and write methods for the fields in the schema associated with the data.
    +
    +### <a id="profile_hdfsseqwritable_writing"></a>Example: Writing Data Using the SequenceWritable Profile
    +
    +In this example, you will create a Java class named `PxfExample_CustomWritable` that will serialize/deserialize the fields in the sample schema used in previous examples. You will then use this class to access a writable external table created with the `SequenceWritable` profile. 
    +
    +Perform the following steps to create the Java class and writable table.
    +
    +1. Prepare to create the sample Java class:
    +
    +    ``` shell
    +    $ mkdir -p pxfex/com/hawq/example/pxf/hdfs/writable/dataschema
    +    $ cd pxfex/com/hawq/example/pxf/hdfs/writable/dataschema
    +    $ vi PxfExample_CustomWritable.java
    +    ```
    +
    +2. Copy and paste the following text into the `PxfExample_CustomWritable.java` file:
    +
    +    ``` java
    +    package com.hawq.example.pxf.hdfs.writable.dataschema;
    +    
    +    import org.apache.hadoop.io.*;
    +import java.io.DataInput;
    +import java.io.DataOutput;
    +import java.io.IOException;
    +import java.lang.reflect.Field;
    +
    +    /**
    + * PxfExample_CustomWritable class - used to serialize and deserialize data with
    + * text, int, and float data types
    + */
    +public class PxfExample_CustomWritable implements Writable {
    +
    +    public String st1, st2;
    +    public int int1;
    +    public float ft;
    +
    +    public PxfExample_CustomWritable() {
    +        st1 = new String("");
    +        st2 = new String("");
    +        int1 = 0;
    +        ft = 0.f;
    +    }
    +
    +    public PxfExample_CustomWritable(int i1, int i2, int i3) {
    +
    +        st1 = new String("short_string___" + i1);
    +        st2 = new String("short_string___" + i1);
    +        int1 = i2;
    +        ft = i1 * 10.f * 2.3f;
    +
    +    }
    +
    +    String GetSt1() {
    +        return st1;
    +    }
    +
    +    String GetSt2() {
    +        return st2;
    +    }
    +
    +    int GetInt1() {
    +        return int1;
    +    }
    +    
    +    float GetFt() {
    +        return ft;
    +    }
    +
    +    @Override
    +    public void write(DataOutput out) throws IOException {
    +
    +        Text txt = new Text();
    +        txt.set(st1);
    +        txt.write(out);
    +        txt.set(st2);
    +        txt.write(out);
    +
    +        IntWritable intw = new IntWritable();
    +        intw.set(int1);
    +        intw.write(out);
    +
    +        FloatWritable fw = new FloatWritable();
    +        fw.set(ft);
    +        fw.write(out);
    +    }
    +
    +    @Override
    +    public void readFields(DataInput in) throws IOException {
    +
    +        Text txt = new Text();
    +        txt.readFields(in);
    +        st1 = txt.toString();
    +        txt.readFields(in);
    +        st2 = txt.toString();
    +
    +        IntWritable intw = new IntWritable();
    +        intw.readFields(in);
    +        int1 = intw.get();
    +
    +        FloatWritable fw = new FloatWritable();
    +        fw.readFields(in);
    +        ft = fw.get();
    +    }
    +
    +    public void printFieldTypes() {
    +        Class myClass = this.getClass();
    +        Field[] fields = myClass.getDeclaredFields();
    +
    +        for (int i = 0; i < fields.length; i++) {
    +            System.out.println(fields[i].getType().getName());
    +        }
    +    }
    +}
    +    ```
    +
    +3. Compile and create a Java class JAR file for `PxfExample_CustomWritable`:
    +
    +    ``` shell
    +    $ javac -classpath /usr/hdp/2.5.0.0-1245/hadoop/hadoop-common.jar  PxfExample_CustomWritable.java
    +    $ cd ../../../../../../../
    +    $ jar cf pxfex-customwritable.jar com
    +    $ cp pxfex-customwritable.jar /tmp/
    +    ```
    +
    +4. Include the new jar file in the PXF Agent classpath by updating the `pxf-public.classpath` file. If you use Ambari to manage your cluster, add the following line via the Ambari UI and restart the PXF Agent:
    +
    +    ``` pre
    +    /tmp/pxfex-customwritable.jar
    +    ```
    +
    +    If you have a command-line-managed HAWQ cluster, perform the following steps on each node in your HAWQ cluster:
    +   - Directly edit `/etc/pxf/conf/pxf-public.classpath` and add the line above.
    +   - Restart the PXF Agent:
    +
    +        ``` shell
    +        $ sudo service pxf-service restart
    +        ```
    +
    +5. Use the PXF `SequenceWritable` profile to create a writable HAWQ external table. Identify the serialization/deserialization Java class you created above in the `DATA-SCHEMA` \<custom-option\>. Use `BLOCK` mode compression with `BZip2` when creating the writable table.
    +
    +    ``` sql
    +    gpadmin=# CREATE WRITABLE EXTERNAL TABLE pxf_tbl_seqwrit (location text, month text, number_of_orders integer, total_sales real)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_seqwrit_file?PROFILE=SequenceWritable&DATA-SCHEMA=com.hawq.example.pxf.hdfs.writable.dataschema.PxfExample_CustomWritable&COMPRESSION_TYPE=BLOCK&COMPRESSION_CODEC=org.apache.hadoop.io.compress.BZip2Codec')
    +              FORMAT 'CUSTOM' (formatter='pxfwritable_export');
    +    ```
    +
    +    Notice that the `'CUSTOM'` `FORMAT` \<formatting-properties\> specify the built-in `pxfwritable_export` formatter.
    +
    +6. Insert some data into `pxf_tbl_seqwrit`:
    +
    +    ``` sql
    +    gpadmin=# INSERT INTO pxf_tbl_seqwrit VALUES ( 'Frankfurt', 'Mar', 777, 3956.98 );
    +    gpadmin=# INSERT INTO pxf_tbl_seqwrit VALUES ( 'Cleveland', 'Oct', 3812, 96645.37 );
    +    ```
    +
    +6. Recall that querying an external writable table is not supported in HAWQ. To read the newly-created writable table, create a HAWQ readable external table referencing the writable table's HDFS file:
    +
    +    ``` sql
    +    gpadmin=# CREATE EXTERNAL TABLE read_pxf_tbl_seqwrit (location text, month text, number_of_orders integer, total_sales real)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_seqwrit_file?PROFILE=SequenceWritable&DATA-SCHEMA=com.hawq.example.pxf.hdfs.writable.dataschema.PxfExample_CustomWritable')
    +              FORMAT 'CUSTOM' (formatter='pxfwritable_import');
    +    ```
    +    The `DATA-SCHEMA` \<custom-option\> must be specified when reading an HDFS file via the `SequenceWritable` profile. Compression-related options need not be provided.
    +    
    +6. Query the readable external table `read_pxf_tbl_seqwrit`:
    +
    +    ``` sql
    +    gpadmin=# SELECT * FROM read_pxf_tbl_seqwrit;
    +    ```
    +    
    +    ``` pre
    +     location  | month | number_of_orders | total_sales 
    +    -----------+-------+------------------+-------------
    +     Frankfurt | Mar   |              777 |     3956.98
    +     Cleveland | Oct   |             3812 |     96645.4
    +    (2 rows)
    +    ```
    +
    +7. To read the `pxf_seqwrit_file` HDFS file via Hive:
    +
    +    IS THIS A VALID USE CASE???  COULD NOT GET THIS TO WORK.
    --- End diff --
    
    Do you still have the DDL's you tried in Hive? What exactly wasn't working?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by lisakowen <gi...@git.apache.org>.
Github user lisakowen commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r86175112
  
    --- Diff: pxf/HDFSWritablePXF.html.md.erb ---
    @@ -0,0 +1,416 @@
    +---
    +title: Writing Data to HDFS
    +---
    +
    +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable table to export data from a HAWQ internal table to binary or text HDFS files.
    +
    +Use the `HdfsTextSimple` profile when writing text data. Use the `SequenceWritable` profile when dealing with binary data.
    +
    +This section describes how to use these PXF profiles to create writable external tables.
    +
    +**Note**: Tables that you create with writable profiles can only be used for INSERT operations.  If you want to query inserted data, you must define a separate external readable table that references the new HDFS file using the equivalent readable profile.  ??You can also create a Hive table to access the HDFS file.??
    +
    +## <a id="pxfwrite_prereq"></a>Prerequisites
    +
    +Before working with HDFS file data using HAWQ and PXF, ensure that:
    +
    +-   The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
    +-   All HDFS users have read permissions to HDFS services.
    +-   HDFS write permissions are provided to a restricted set of users.
    +
    +## <a id="hdfsplugin_writeextdata"></a>Writing to PXF External Tables
    +The PXF HDFS plug-in supports two writable profiles: `HdfsTextSimple` and `SequenceWritable`.
    +
    +Use the following syntax to create a HAWQ external writable table representing HDFS data:�
    +
    +``` sql
    +CREATE WRITABLE EXTERNAL TABLE <table_name> 
    +    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
    +LOCATION ('pxf://<host>[:<port>]/<path-to-hdfs-file>
    +    ?PROFILE=HdfsTextSimple|SequenceWritable[&<custom-option>=<value>[...]]')
    +FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
    +```
    +
    +HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the table below.
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| \<host\>[:\<port\>]    | The HDFS NameNode and port. |
    +| \<path-to-hdfs-file\>    | The path to the file in the HDFS data store. |
    +| PROFILE    | The `PROFILE` keyword must specify one of the values `HdfsTextSimple` or `SequenceWritable`. |
    +| \<custom-option\>  | \<custom-option\> is profile-specific. These options are discussed in the next topic.|
    +| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile to create a plain-text-delimited file at the location specified by \<path-to-hdfs-file\>. The `HdfsTextSimple` '`TEXT`' `FORMAT` supports only the built-in `(delimiter=<delim>)` \<formatting-property\>. |
    +| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with the `HdfsTextSimple` profile to create a comma-separated-value file at the location specified by \<path-to-hdfs-file\>.  |
    +| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the `SequenceWritable` profile. The `SequenceWritable` '`CUSTOM`' `FORMAT` supports only the built-in `(formatter='pxfwritable_export)` (write) and `(formatter='pxfwritable_import)` (read) \<formatting-properties\>.
    +
    +**Note**: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification.
    +
    +## <a id="profile_hdfstextsimple"></a>Custom Options
    +
    +The `HdfsTextSimple` and `SequenceWritable` profiles support the following custom options:
    +
    +| Option  | Value Description | Profile |
    +|-------|-------------------------------------|--------|
    +| COMPRESSION_CODEC    | The compression codec Java class name. If this option is not provided, no data compression is performed. Supported compression codecs include: `org.apache.hadoop.io.compress.DefaultCodec` and `org.apache.hadoop.io.compress.BZip2Codec` | HdfsTextSimple, SequenceWritable |
    +|    |  `org.apache.hadoop.io.compress.GzipCodec` | HdfsTextSimple |
    +| COMPRESSION_TYPE    | The compression type to employ; supported values are `RECORD` (the default) or `BLOCK`. | HdfsTextSimple, SequenceWritable |
    +| DATA-SCHEMA    | The name of the writer serialization/deserialization class. The jar file in which this class resides must be in the PXF class path. This option is required for the `SequenceWritable` profile and has no default value. | SequenceWritable|
    +| THREAD-SAFE | Boolean value determining if a table query can run in multi-threaded mode. The default value is `TRUE`. Set this option to `FALSE` to handle all requests in a single thread for operations that are not thread-safe (for example, compression). | HdfsTextSimple, SequenceWritable| 
    +
    +## <a id="profile_hdfstextsimple"></a>HdfsTextSimple Profile
    +
    +Use the `HdfsTextSimple` profile when writing delimited data to a plain text file where each row is a single record.
    +
    +Writable tables created using the `HdfsTextSimple` profile can optionally use record or block compression. The following compression codecs are supported:
    +
    +- org.apache.hadoop.io.compress.DefaultCodec
    +- org.apache.hadoop.io.compress.GzipCodec
    +- org.apache.hadoop.io.compress.BZip2Codec
    +
    +The `HdfsTextSimple` profile supports the following \<formatting-properties\>:
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| delimiter    | The delimiter character to use when writing the file. Default value is a comma `,`.|
    +
    +
    +### <a id="profile_hdfstextsimple_writing"></a>Example: Writing Data Using the HdfsTextSimple Profile
    +
    +This example uses the data schema introduced in [Example: Using the HdfsTextSimple Profile] (HDFSFileDataPXF.html#profile_hdfstextsimple_query):
    +
    +
    +| Field Name  | Data Type |
    +|-------|-------------------------------------|
    +| location | text |
    +| month | text |
    +| number\_of\_orders | int |
    +| total\_sales | float8 |
    +
    +The example also uses the HAWQ table `pxf_hdfs_textsimple` created in that exercise and expects it to exist.
    +
    +
    +Perform the following operations to use the PXF `HdfsTextSimple` profile to create a HAWQ writable external table with the same data schema as defined above. You will also create a separate external readable table to read the associated HDFS file.
    +
    +1. Create a writable HAWQ external table with the data schema described above. Write to the HDFS file `/data/pxf_examples/pxfwritable_hdfs_textsimple1`. Create the table specifying a comma `,` as the delimiter:
    +
    +    ``` sql
    +    gpadmin=# CREATE WRITABLE EXTERNAL TABLE pxf_hdfs_writabletbl_1(location text, month text, num_orders int, total_sales float8)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxfwritable_hdfs_textsimple1?PROFILE=HdfsTextSimple')
    +              FORMAT 'TEXT' (delimiter=E',');
    +    ```
    +    
    +    The `FORMAT` subclause `delimiter` value is specified as the single ascii comma character `,`. `E` escapes the character.
    +
    +2. Write a few records to the `pxfwritable_hdfs_textsimple1` HDFS file by invoking the SQL `INSERT` command on `pxf_hdfs_writabletbl_1`:
    +
    +    ``` sql
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_1 VALUES ( 'Frankfurt', 'Mar', 777, 3956.98 );
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_1 VALUES ( 'Cleveland', 'Oct', 3812, 96645.37 );
    +    ```
    +
    +3. Insert the contents of the `pxf_hdfs_textsimple` table created in [Example: Using the HdfsTextSimple Profile] (HDFSFileDataPXF.html#profile_hdfstextsimple_query) into `pxf_hdfs_writabletbl_1`:
    +
    +    ``` sql
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_1 SELECT * FROM pxf_hdfs_textsimple;
    +    ```
    +
    +4. View the file contents in HDFS:
    +
    +    ``` shell
    +    $ hdfs dfs -cat /data/pxf_examples/pxfwritable_hdfs_textsimple1/*
    +    Frankfurt,Mar,777,3956.98
    +    Cleveland,Oct,3812,96645.37
    +    Prague,Jan,101,4875.33
    +    Rome,Mar,87,1557.39
    +    Bangalore,May,317,8936.99
    +    Beijing,Jul,411,11600.67
    +    ```
    +
    +    Because you specified comma `,` as the delimiter, this character is the field separator used in each record of the HDFS file.
    +
    +5. Querying an external writable table is not supported in HAWQ. To query data from the newly-created HDFS file, create a readable external HAWQ table referencing the HDFS file:
    +
    +    ``` sql
    +    gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple_r1(location text, month text, num_orders int, total_sales float8)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxfwritable_hdfs_textsimple1?PROFILE=HdfsTextSimple')
    +			    FORMAT 'CSV';
    +    ```
    +
    +    The table is created with the `'CSV'` `FORMAT` because the delimiter character used when creating the writable table was a comma `,`.
    +
    +6. Query the readable external table `pxf_hdfs_textsimple_r1`:
    +
    +    ``` sql
    +    gpadmin=# SELECT * FROM pxf_hdfs_textsimple_r1;
    +    ```
    +
    +    ``` pre
    +     location  | month | num_orders | total_sales
    +    -----------+-------+------------+-------------
    +      Frankfurt | Mar   |        777 |     3956.98
    +      Cleveland | Oct   |       3812 |    96645.37
    +      Prague    | Jan   |        101 |     4875.33
    +      Rome      | Mar   |         87 |     1557.39
    +      Bangalore | May   |        317 |     8936.99
    +      Beijing   | Jul   |        411 |    11600.67
    +    (6 rows)
    +    ```
    +
    +    The table includes the records you individually inserted, as well as the full contents of the `pxf_hdfs_textsimple` table.
    +
    +7. Create a second HAWQ external writable table, this time using Gzip compression and employing a colon `:` as the delimiter:
    +
    +    ``` sql
    +    gpadmin=# CREATE WRITABLE EXTERNAL TABLE pxf_hdfs_writabletbl_2 (location text, month text, num_orders int, total_sales float8)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxfwritable_hdfs_textsimple2?PROFILE=HdfsTextSimple&COMPRESSION_CODEC=org.apache.hadoop.io.compress.GzipCodec')
    +              FORMAT 'TEXT' (delimiter=E':');
    +    ```
    +
    +8. Write a few records to the `pxfwritable_hdfs_textsimple2` HDFS file by inserting into the `pxf_hdfs_writabletbl_2` table:
    +
    +    ``` sql
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_2 VALUES ( 'Frankfurt', 'Mar', 777, 3956.98 );
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_2 VALUES ( 'Cleveland', 'Oct', 3812, 96645.37 );
    +    ```
    +
    +9. View the file contents in HDFS; use the `-text` option to `hdfs dfs` to view the compressed data as text:
    +
    +    ``` shell
    +    $ hdfs dfs -text /data/pxf_examples/pxfwritable_hdfs_textsimple2/*
    +    Frankfurt:Mar:777:3956.98
    +    Cleveland:Oct:3812:96645.3
    +    ```
    +
    +    Notice that the colon `:` is the field separator in the HDFS file.
    +
    +    As described in Step 5 above, to query data from the newly-created HDFS file named `pxfwritable_hdfs_textsimple2`, you can create a readable external HAWQ table referencing this HDFS file.
    +
    +
    +## <a id="profile_hdfsseqwritable_using"></a>SequenceWritable Profile
    +
    +Use the HDFS plug-in `SequenceWritable` profile when writing SequenceFile format files. Files of this type consist of binary key/value pairs. Sequence files are a common data transfer format between MapReduce jobs. 
    +
    +SequenceFile format files can optionally use record or block compression. The following compression codecs are supported:
    +
    +- org.apache.hadoop.io.compress.DefaultCodec
    +- org.apache.hadoop.io.compress.BZip2Codec
    +
    +When using the `SequenceWritable` profile to write a SequenceFile format file, you must provide the name of the Java class used to serialize/deserialize the data. This class will provide custom read and write methods for the fields in the schema associated with the data you will be writing.
    +
    +### <a id="profile_hdfsseqwritable_writing"></a>Example: Writing Using the SequenceWritable Profile
    +
    +Before creating a writable external table with the `SequenceWritable` profile, you will first create a Java class named `PxfExample_CustomWritable` that will serialize/deserialize the fields in the sample schema used in previous examples.
    +
    +Perform the following steps to create the Java class and writable table.
    +
    +1. Prepare to create the sample Java class:
    +
    +    ``` shell
    +    $ mkdir -p pxfex/com/hawq/example/pxf/hdfs/writable/dataschema
    +    $ cd pxfex/com/hawq/example/pxf/hdfs/writable/dataschema
    +    $ vi PxfExample_CustomWritable.java
    +    ```
    +
    +2. Copy and paste the following text into the `PxfExample_CustomWritable.java` file:
    +
    +    ``` java
    +    package com.hawq.example.pxf.hdfs.writable.dataschema;
    +    
    +    import org.apache.hadoop.io.*;
    +import java.io.DataInput;
    +import java.io.DataOutput;
    +import java.io.IOException;
    +import java.lang.reflect.Field;
    +
    +    /**
    + * PxfExample_CustomWritable class - used to serialize and deserialize data with
    + * text, int, and float data types
    + */
    +public class PxfExample_CustomWritable implements Writable {
    +
    +    public String st1, st2;
    +    public int int1;
    +    public float ft;
    +
    +    public PxfExample_CustomWritable() {
    +        st1 = new String("");
    +        st2 = new String("");
    +        int1 = 0;
    +        ft = 0.f;
    +    }
    +
    +    public PxfExample_CustomWritable(int i1, int i2, int i3) {
    +
    +        st1 = new String("short_string___" + i1);
    +        st2 = new String("short_string___" + i1);
    +        int1 = i2;
    +        ft = i1 * 10.f * 2.3f;
    +
    +    }
    +
    +    String GetSt1() {
    +        return st1;
    +    }
    +
    +    String GetSt2() {
    +        return st2;
    +    }
    +
    +    int GetInt1() {
    +        return int1;
    +    }
    +    
    +    float GetFt() {
    +        return ft;
    +    }
    +
    +    @Override
    +    public void write(DataOutput out) throws IOException {
    +
    +        Text txt = new Text();
    +        txt.set(st1);
    +        txt.write(out);
    +        txt.set(st2);
    +        txt.write(out);
    +
    +        IntWritable intw = new IntWritable();
    +        intw.set(int1);
    +        intw.write(out);
    +
    +        FloatWritable fw = new FloatWritable();
    +        fw.set(ft);
    +        fw.write(out);
    +    }
    +
    +    @Override
    +    public void readFields(DataInput in) throws IOException {
    +
    +        Text txt = new Text();
    +        txt.readFields(in);
    +        st1 = txt.toString();
    +        txt.readFields(in);
    +        st2 = txt.toString();
    +
    +        IntWritable intw = new IntWritable();
    +        intw.readFields(in);
    +        int1 = intw.get();
    +
    +        FloatWritable fw = new FloatWritable();
    +        fw.readFields(in);
    +        ft = fw.get();
    +    }
    +
    +    public void printFieldTypes() {
    +        Class myClass = this.getClass();
    +        Field[] fields = myClass.getDeclaredFields();
    +
    +        for (int i = 0; i < fields.length; i++) {
    +            System.out.println(fields[i].getType().getName());
    +        }
    +    }
    +}
    +    ```
    +
    +3. Compile and create a Java class jar file for `PxfExample_CustomWritable`:
    +
    +    ``` shell
    +    $ javac -classpath /usr/hdp/2.5.0.0-1245/hadoop/hadoop-common.jar  PxfExample_CustomWritable.java
    +    $ cd ../../../../../../../
    +    $ jar cf pxfex-customwritable.jar com
    +    $ cp pxfex-customwritable.jar /tmp/
    +    ```
    +
    +4. Include the new jar file in the PXF Agent class path by updating the `pxf-public.classpath`. If you use Ambari to manage your cluster, add the following line via the Ambari UI and restart the PXF Agent:
    +
    +    ``` pre
    +    /tmp/pxfex-customwritable.jar
    +    ```
    +
    +    If you have a command-line-managed HAWQ cluster, perform the following steps on each node in your HAWQ cluster:
    +   - Directly edit `/etc/pxf/conf/pxf-public.classpath` and add the line above.
    +   - Restart the PXF Agent:
    +
    +        ``` shell
    +        $ sudo service pxf-service restart
    +        ```
    +
    +5. Use the PXF `SequenceWritable` profile to create a writable HAWQ external table. Identify the serialization/deserialization Java class you created above in the `DATA-SCHEMA` \<custom-option\>. Use `BLOCK` mode compression with `BZip2` when creating the writable table.
    +
    +    ``` sql
    +    gpadmin=# CREATE WRITABLE EXTERNAL TABLE pxf_tbl_seqwrit (location text, month text, number_of_orders integer, total_sales real)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_seqwrit_file?PROFILE=SequenceWritable&DATA-SCHEMA=com.hawq.example.pxf.hdfs.writable.dataschema.PxfExample_CustomWritable&COMPRESSION_TYPE=BLOCK&COMPRESSION_CODEC=org.apache.hadoop.io.compress.BZip2Codec')
    +              FORMAT 'CUSTOM' (formatter='pxfwritable_export');
    +    ```
    +
    +    Notice that the `'CUSTOM'` `FORMAT` \<formatting-properties\> specify the built-in `pxfwritable_export` formatter.
    +
    +6. Insert some data into `pxf_tbl_seqwrit`:
    +
    +    ``` sql
    +    gpadmin=# INSERT INTO pxf_tbl_seqwrit VALUES ( 'Frankfurt', 'Mar', 777, 3956.98 );
    +    gpadmin=# INSERT INTO pxf_tbl_seqwrit VALUES ( 'Cleveland', 'Oct', 3812, 96645.37 );
    +    ```
    +
    +6. Recall that querying an external writable table is not supported in HAWQ. To read the newly-created writable table, create a HAWQ readable external table referencing the writable table's HDFS file:
    +
    +    ``` sql
    +    gpadmin=# CREATE EXTERNAL TABLE read_pxf_tbl_seqwrit (location text, month text, number_of_orders integer, total_sales real)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_seqwrit_file?PROFILE=SequenceWritable&DATA-SCHEMA=com.hawq.example.pxf.hdfs.writable.dataschema.PxfExample_CustomWritable')
    +              FORMAT 'CUSTOM' (formatter='pxfwritable_import');
    +    ```
    +    The `DATA-SCHEMA` \<custom-option\> must be specified when reading an HDFS file via the `SequenceWritable` profile. Compression-related options need not be provided.
    +    
    +6. Query the readable external table `read_pxf_tbl_seqwrit`:
    +
    +    ``` sql
    +    gpadmin=# SELECT * FROM read_pxf_tbl_seqwrit;
    +    ```
    +    
    +    ``` pre
    +     location  | month | number_of_orders | total_sales 
    +    -----------+-------+------------------+-------------
    +     Frankfurt | Mar   |              777 |     3956.98
    +     Cleveland | Oct   |             3812 |     96645.4
    +    (2 rows)
    +    ```
    +
    +7. To read the `pxf_seqwrit_file` HDFS file via Hive:
    +
    +    IS THIS A VALID USE CASE???  COULD NOT GET THIS TO WORK.
    +
    +## <a id="recordkeyinkey-valuefileformats"></a>Reading the Record Key 
    +
    +Sequence file and other file formats that store rows in a key-value format can access the key value through HAWQ by using the `recordkey` keyword as a field name.
    +
    +The field type of `recordkey` must correspond to the key type, much as the other fields must match the HDFS data.�
    +
    +`recordkey` can be any of the following Hadoop types:
    +
    +-   BooleanWritable
    +-   ByteWritable
    +-   DoubleWritable
    +-   FloatWritable
    +-   IntWritable
    +-   LongWritable
    +-   Text
    +
    +### <a id="example1"></a>Example: Using Record Keys
    +
    +Create an external readable table to access the record keys from the writable table `XXX` we created in SequenceFile format in the section above.  The `recordkey` is of type `int8`. (We did not identify a record key when creating the writable table.
    +
    +``` sql
    +gpadmin=# CREATE  EXTERNAL TABLE read_pxf_tbl_seqwrit_RECKEY (recordkey int8, location text, month text, number_of_orders integer, total_sales real)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_seqwrit_file?PROFILE=SequenceWritable&DATA-SCHEMA=com.hawq.example.pxf.hdfs.writable.dataschema.PxfExample_CustomWritable')
    +          FORMAT 'CUSTOM' (formatter='pxfwritable_import');
    +gpadmin=# SELECT * FROM read_pxf_tbl_seqwrit_RECKEY;
    +```
    +
    +``` pre
    + recordkey |  location   | month | number_of_orders | total_sales 
    +-----------+-------------+-------+------------------+-------------
    +         0 | Frankfurt   | Mar   |              777 |     3956.98
    +         0 | Cleveland   | Oct   |             3812 |     96645.4
    +(2 rows)
    +```
    +
    +The `recordkey` is display as `0`; a record key was not identified when you inserted entries in the writable table.
    --- End diff --
    
    yep, i need to rework the example a bit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by dyozie <gi...@git.apache.org>.
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r85791950
  
    --- Diff: pxf/HDFSWritablePXF.html.md.erb ---
    @@ -0,0 +1,410 @@
    +---
    +title: Writing Data to HDFS
    +---
    +
    +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable table to export data from a HAWQ internal table to HDFS.
    +
    +This section describes how to use these PXF profiles to create writable external tables.
    +
    +**Note**: You cannot directly query data in a HAWQ writable table.  After creating the external writable table, you must create a HAWQ readable external table accessing the HDFS file, then query that table. ??You can also create a Hive table to access the HDFS file.??
    +
    +## <a id="pxfwrite_prereq"></a>Prerequisites
    +
    +Before working with HDFS file data using HAWQ and PXF, ensure that:
    +
    +-   The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
    +-   All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users.
    +
    +## <a id="hdfsplugin_writeextdata"></a>Writing to PXF External Tables
    +The PXF HDFS plug-in supports writable two profiles: `HdfsTextSimple` and `SequenceWritable`.
    +
    +Use the following syntax to create a HAWQ external writable table representing HDFS data:�
    +
    +``` sql
    +CREATE EXTERNAL WRITABLE TABLE <table_name> 
    --- End diff --
    
    That syntax is unfortunate.  GPDB uses CREATE WRITABLE EXTERNAL instead of CREATE EXTERNAL WRITABLE :(


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by dyozie <gi...@git.apache.org>.
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r85791234
  
    --- Diff: pxf/HDFSWritablePXF.html.md.erb ---
    @@ -0,0 +1,410 @@
    +---
    +title: Writing Data to HDFS
    +---
    +
    +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable table to export data from a HAWQ internal table to HDFS.
    +
    +This section describes how to use these PXF profiles to create writable external tables.
    +
    +**Note**: You cannot directly query data in a HAWQ writable table.  After creating the external writable table, you must create a HAWQ readable external table accessing the HDFS file, then query that table. ??You can also create a Hive table to access the HDFS file.??
    +
    +## <a id="pxfwrite_prereq"></a>Prerequisites
    +
    +Before working with HDFS file data using HAWQ and PXF, ensure that:
    +
    +-   The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
    +-   All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users.
    --- End diff --
    
    Break the last requirement into a separate bullet:  Write permissions should be provided to a restricted set of user.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by dyozie <gi...@git.apache.org>.
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r85792675
  
    --- Diff: pxf/HDFSWritablePXF.html.md.erb ---
    @@ -0,0 +1,410 @@
    +---
    +title: Writing Data to HDFS
    +---
    +
    +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable table to export data from a HAWQ internal table to HDFS.
    +
    +This section describes how to use these PXF profiles to create writable external tables.
    +
    +**Note**: You cannot directly query data in a HAWQ writable table.  After creating the external writable table, you must create a HAWQ readable external table accessing the HDFS file, then query that table. ??You can also create a Hive table to access the HDFS file.??
    +
    +## <a id="pxfwrite_prereq"></a>Prerequisites
    +
    +Before working with HDFS file data using HAWQ and PXF, ensure that:
    +
    +-   The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
    +-   All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users.
    +
    +## <a id="hdfsplugin_writeextdata"></a>Writing to PXF External Tables
    +The PXF HDFS plug-in supports writable two profiles: `HdfsTextSimple` and `SequenceWritable`.
    +
    +Use the following syntax to create a HAWQ external writable table representing HDFS data:�
    +
    +``` sql
    +CREATE EXTERNAL WRITABLE TABLE <table_name> 
    +    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
    +LOCATION ('pxf://<host>[:<port>]/<path-to-hdfs-file>
    +    ?PROFILE=HdfsTextSimple|SequenceWritable[&<custom-option>=<value>[...]]')
    +FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
    +```
    +
    +HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the table below.
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| \<host\>[:\<port\>]    | The HDFS NameNode and port. |
    +| \<path-to-hdfs-file\>    | The path to the file in the HDFS data store. |
    +| PROFILE    | The `PROFILE` keyword must specify one of the values `HdfsTextSimple` or `SequenceWritable`. |
    +| \<custom-option\>  | \<custom-option\> is profile-specific. These options are discussed in the next topic.|
    --- End diff --
    
    Maybe change this to <profile-option> ?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by dyozie <gi...@git.apache.org>.
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r85793887
  
    --- Diff: pxf/HDFSWritablePXF.html.md.erb ---
    @@ -0,0 +1,410 @@
    +---
    +title: Writing Data to HDFS
    +---
    +
    +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable table to export data from a HAWQ internal table to HDFS.
    +
    +This section describes how to use these PXF profiles to create writable external tables.
    +
    +**Note**: You cannot directly query data in a HAWQ writable table.  After creating the external writable table, you must create a HAWQ readable external table accessing the HDFS file, then query that table. ??You can also create a Hive table to access the HDFS file.??
    +
    +## <a id="pxfwrite_prereq"></a>Prerequisites
    +
    +Before working with HDFS file data using HAWQ and PXF, ensure that:
    +
    +-   The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
    +-   All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users.
    +
    +## <a id="hdfsplugin_writeextdata"></a>Writing to PXF External Tables
    +The PXF HDFS plug-in supports writable two profiles: `HdfsTextSimple` and `SequenceWritable`.
    +
    +Use the following syntax to create a HAWQ external writable table representing HDFS data:�
    +
    +``` sql
    +CREATE EXTERNAL WRITABLE TABLE <table_name> 
    +    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
    +LOCATION ('pxf://<host>[:<port>]/<path-to-hdfs-file>
    +    ?PROFILE=HdfsTextSimple|SequenceWritable[&<custom-option>=<value>[...]]')
    +FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
    +```
    +
    +HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the table below.
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| \<host\>[:\<port\>]    | The HDFS NameNode and port. |
    +| \<path-to-hdfs-file\>    | The path to the file in the HDFS data store. |
    +| PROFILE    | The `PROFILE` keyword must specify one of the values `HdfsTextSimple` or `SequenceWritable`. |
    +| \<custom-option\>  | \<custom-option\> is profile-specific. These options are discussed in the next topic.|
    +| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile when \<path-to-hdfs-file\> will reference a plain text delimited file. The `HdfsTextSimple` '`TEXT`' `FORMAT` supports only the built-in `(delimiter=<delim>)` \<formatting-property\>. |
    +| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with `HdfsTextSimple` when \<path-to-hdfs-file\> will reference a comma-separated value file.  |
    +| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the `SequenceWritable` profile. The `SequenceWritable` '`CUSTOM`' `FORMAT` supports only the built-in `(formatter='pxfwritable_export)` (write) and `(formatter='pxfwritable_import)` (read) \<formatting-properties\>.
    +
    +**Note**: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification.
    +
    +## <a id="profile_hdfstextsimple"></a>Custom Options
    +
    +The `HdfsTextSimple` and `SequenceWritable` profiles support the following \<custom-options\>:
    +
    +| Keyword  | Value Description |
    +|-------|-------------------------------------|
    +| COMPRESSION_CODEC    | The compression codec Java class name. If this option is not provided, no data compression is performed. Supported compression codecs include: `org.apache.hadoop.io.compress.DefaultCodec`, `org.apache.hadoop.io.compress.BZip2Codec`, and `org.apache.hadoop.io.compress.GzipCodec` (`HdfsTextSimple` profile only) |
    --- End diff --
    
    Instead of including parentheticals here (`HdfsTextSimple` profile only), add a third column to indicate which profile(s) the option applies to.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by dyozie <gi...@git.apache.org>.
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r85813059
  
    --- Diff: pxf/HDFSWritablePXF.html.md.erb ---
    @@ -0,0 +1,410 @@
    +---
    +title: Writing Data to HDFS
    +---
    +
    +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable table to export data from a HAWQ internal table to HDFS.
    +
    +This section describes how to use these PXF profiles to create writable external tables.
    +
    +**Note**: You cannot directly query data in a HAWQ writable table.  After creating the external writable table, you must create a HAWQ readable external table accessing the HDFS file, then query that table. ??You can also create a Hive table to access the HDFS file.??
    +
    +## <a id="pxfwrite_prereq"></a>Prerequisites
    +
    +Before working with HDFS file data using HAWQ and PXF, ensure that:
    +
    +-   The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
    +-   All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users.
    +
    +## <a id="hdfsplugin_writeextdata"></a>Writing to PXF External Tables
    +The PXF HDFS plug-in supports writable two profiles: `HdfsTextSimple` and `SequenceWritable`.
    +
    +Use the following syntax to create a HAWQ external writable table representing HDFS data:�
    +
    +``` sql
    +CREATE EXTERNAL WRITABLE TABLE <table_name> 
    +    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
    +LOCATION ('pxf://<host>[:<port>]/<path-to-hdfs-file>
    +    ?PROFILE=HdfsTextSimple|SequenceWritable[&<custom-option>=<value>[...]]')
    +FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
    +```
    +
    +HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the table below.
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| \<host\>[:\<port\>]    | The HDFS NameNode and port. |
    +| \<path-to-hdfs-file\>    | The path to the file in the HDFS data store. |
    +| PROFILE    | The `PROFILE` keyword must specify one of the values `HdfsTextSimple` or `SequenceWritable`. |
    +| \<custom-option\>  | \<custom-option\> is profile-specific. These options are discussed in the next topic.|
    +| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile when \<path-to-hdfs-file\> will reference a plain text delimited file. The `HdfsTextSimple` '`TEXT`' `FORMAT` supports only the built-in `(delimiter=<delim>)` \<formatting-property\>. |
    +| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with `HdfsTextSimple` when \<path-to-hdfs-file\> will reference a comma-separated value file.  |
    +| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the `SequenceWritable` profile. The `SequenceWritable` '`CUSTOM`' `FORMAT` supports only the built-in `(formatter='pxfwritable_export)` (write) and `(formatter='pxfwritable_import)` (read) \<formatting-properties\>.
    +
    +**Note**: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification.
    +
    +## <a id="profile_hdfstextsimple"></a>Custom Options
    +
    +The `HdfsTextSimple` and `SequenceWritable` profiles support the following \<custom-options\>:
    +
    +| Keyword  | Value Description |
    +|-------|-------------------------------------|
    +| COMPRESSION_CODEC    | The compression codec Java class name. If this option is not provided, no data compression is performed. Supported compression codecs include: `org.apache.hadoop.io.compress.DefaultCodec`, `org.apache.hadoop.io.compress.BZip2Codec`, and `org.apache.hadoop.io.compress.GzipCodec` (`HdfsTextSimple` profile only) |
    +| COMPRESSION_TYPE    | The compression type to employ; supported values are `RECORD` (the default) or `BLOCK`. |
    +| DATA-SCHEMA    | (`SequenceWritable` profile only) The name of the writer serialization/deserialization class. The jar file in which this class resides must be in the PXF class path. This option has no default value. |
    +| THREAD-SAFE | Boolean value determining if a table query can run in multi-thread mode. Default value is `TRUE`, requests run in multi-threaded mode. When set to `FALSE`, requests will be handled in a single thread.  `THREAD-SAFE` should be set appropriately when operations that are not thread-safe are performed (i.e. compression). |
    +
    +## <a id="profile_hdfstextsimple"></a>HdfsTextSimple Profile
    +
    +Use the `HdfsTextSimple` profile when writing delimited data to a plain text file where each row is a single record.
    +
    +Writable tables created using the `HdfsTextSimple` profile can use no, record, or block compression. When compression is used, the default, gzip, and bzip2 Hadoop compression codecs are supported:
    --- End diff --
    
    Small edit:  Writable tables created using the HdfsTextSimple profile can optionally use `record` or `block` compression. The following compression codecs are supported when compression is enabled:


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by dyozie <gi...@git.apache.org>.
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r86049842
  
    --- Diff: pxf/HDFSWritablePXF.html.md.erb ---
    @@ -0,0 +1,416 @@
    +---
    +title: Writing Data to HDFS
    +---
    +
    +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable table to export data from a HAWQ internal table to binary or text HDFS files.
    +
    +Use the `HdfsTextSimple` profile when writing text data. Use the `SequenceWritable` profile when dealing with binary data.
    +
    +This section describes how to use these PXF profiles to create writable external tables.
    +
    +**Note**: Tables that you create with writable profiles can only be used for INSERT operations.  If you want to query inserted data, you must define a separate external readable table that references the new HDFS file using the equivalent readable profile.  ??You can also create a Hive table to access the HDFS file.??
    +
    +## <a id="pxfwrite_prereq"></a>Prerequisites
    +
    +Before working with HDFS file data using HAWQ and PXF, ensure that:
    +
    +-   The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
    +-   All HDFS users have read permissions to HDFS services.
    +-   HDFS write permissions are provided to a restricted set of users.
    +
    +## <a id="hdfsplugin_writeextdata"></a>Writing to PXF External Tables
    +The PXF HDFS plug-in supports two writable profiles: `HdfsTextSimple` and `SequenceWritable`.
    +
    +Use the following syntax to create a HAWQ external writable table representing HDFS data:�
    +
    +``` sql
    +CREATE WRITABLE EXTERNAL TABLE <table_name> 
    +    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
    +LOCATION ('pxf://<host>[:<port>]/<path-to-hdfs-file>
    +    ?PROFILE=HdfsTextSimple|SequenceWritable[&<custom-option>=<value>[...]]')
    +FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
    +```
    +
    +HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the table below.
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| \<host\>[:\<port\>]    | The HDFS NameNode and port. |
    +| \<path-to-hdfs-file\>    | The path to the file in the HDFS data store. |
    +| PROFILE    | The `PROFILE` keyword must specify one of the values `HdfsTextSimple` or `SequenceWritable`. |
    +| \<custom-option\>  | \<custom-option\> is profile-specific. These options are discussed in the next topic.|
    +| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile to create a plain-text-delimited file at the location specified by \<path-to-hdfs-file\>. The `HdfsTextSimple` '`TEXT`' `FORMAT` supports only the built-in `(delimiter=<delim>)` \<formatting-property\>. |
    +| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with the `HdfsTextSimple` profile to create a comma-separated-value file at the location specified by \<path-to-hdfs-file\>.  |
    +| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the `SequenceWritable` profile. The `SequenceWritable` '`CUSTOM`' `FORMAT` supports only the built-in `(formatter='pxfwritable_export)` (write) and `(formatter='pxfwritable_import)` (read) \<formatting-properties\>.
    +
    +**Note**: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification.
    +
    +## <a id="profile_hdfstextsimple"></a>Custom Options
    +
    +The `HdfsTextSimple` and `SequenceWritable` profiles support the following custom options:
    +
    +| Option  | Value Description | Profile |
    +|-------|-------------------------------------|--------|
    +| COMPRESSION_CODEC    | The compression codec Java class name. If this option is not provided, no data compression is performed. Supported compression codecs include: `org.apache.hadoop.io.compress.DefaultCodec` and `org.apache.hadoop.io.compress.BZip2Codec` | HdfsTextSimple, SequenceWritable |
    +|    |  `org.apache.hadoop.io.compress.GzipCodec` | HdfsTextSimple |
    +| COMPRESSION_TYPE    | The compression type to employ; supported values are `RECORD` (the default) or `BLOCK`. | HdfsTextSimple, SequenceWritable |
    +| DATA-SCHEMA    | The name of the writer serialization/deserialization class. The jar file in which this class resides must be in the PXF class path. This option is required for the `SequenceWritable` profile and has no default value. | SequenceWritable|
    +| THREAD-SAFE | Boolean value determining if a table query can run in multi-threaded mode. The default value is `TRUE`. Set this option to `FALSE` to handle all requests in a single thread for operations that are not thread-safe (for example, compression). | HdfsTextSimple, SequenceWritable| 
    +
    +## <a id="profile_hdfstextsimple"></a>HdfsTextSimple Profile
    +
    +Use the `HdfsTextSimple` profile when writing delimited data to a plain text file where each row is a single record.
    +
    +Writable tables created using the `HdfsTextSimple` profile can optionally use record or block compression. The following compression codecs are supported:
    +
    +- org.apache.hadoop.io.compress.DefaultCodec
    +- org.apache.hadoop.io.compress.GzipCodec
    +- org.apache.hadoop.io.compress.BZip2Codec
    +
    +The `HdfsTextSimple` profile supports the following \<formatting-properties\>:
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| delimiter    | The delimiter character to use when writing the file. Default value is a comma `,`.|
    +
    +
    +### <a id="profile_hdfstextsimple_writing"></a>Example: Writing Data Using the HdfsTextSimple Profile
    +
    +This example uses the data schema introduced in [Example: Using the HdfsTextSimple Profile] (HDFSFileDataPXF.html#profile_hdfstextsimple_query):
    +
    +
    +| Field Name  | Data Type |
    +|-------|-------------------------------------|
    +| location | text |
    +| month | text |
    +| number\_of\_orders | int |
    +| total\_sales | float8 |
    +
    +The example also uses the HAWQ table `pxf_hdfs_textsimple` created in that exercise and expects it to exist.
    +
    +
    +Perform the following operations to use the PXF `HdfsTextSimple` profile to create a HAWQ writable external table with the same data schema as defined above. You will also create a separate external readable table to read the associated HDFS file.
    +
    +1. Create a writable HAWQ external table with the data schema described above. Write to the HDFS file `/data/pxf_examples/pxfwritable_hdfs_textsimple1`. Create the table specifying a comma `,` as the delimiter:
    +
    +    ``` sql
    +    gpadmin=# CREATE WRITABLE EXTERNAL TABLE pxf_hdfs_writabletbl_1(location text, month text, num_orders int, total_sales float8)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxfwritable_hdfs_textsimple1?PROFILE=HdfsTextSimple')
    +              FORMAT 'TEXT' (delimiter=E',');
    +    ```
    +    
    +    The `FORMAT` subclause `delimiter` value is specified as the single ascii comma character `,`. `E` escapes the character.
    +
    +2. Write a few records to the `pxfwritable_hdfs_textsimple1` HDFS file by invoking the SQL `INSERT` command on `pxf_hdfs_writabletbl_1`:
    +
    +    ``` sql
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_1 VALUES ( 'Frankfurt', 'Mar', 777, 3956.98 );
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_1 VALUES ( 'Cleveland', 'Oct', 3812, 96645.37 );
    +    ```
    +
    +3. Insert the contents of the `pxf_hdfs_textsimple` table created in [Example: Using the HdfsTextSimple Profile] (HDFSFileDataPXF.html#profile_hdfstextsimple_query) into `pxf_hdfs_writabletbl_1`:
    +
    +    ``` sql
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_1 SELECT * FROM pxf_hdfs_textsimple;
    +    ```
    +
    +4. View the file contents in HDFS:
    +
    +    ``` shell
    +    $ hdfs dfs -cat /data/pxf_examples/pxfwritable_hdfs_textsimple1/*
    +    Frankfurt,Mar,777,3956.98
    +    Cleveland,Oct,3812,96645.37
    +    Prague,Jan,101,4875.33
    +    Rome,Mar,87,1557.39
    +    Bangalore,May,317,8936.99
    +    Beijing,Jul,411,11600.67
    +    ```
    +
    +    Because you specified comma `,` as the delimiter, this character is the field separator used in each record of the HDFS file.
    +
    +5. Querying an external writable table is not supported in HAWQ. To query data from the newly-created HDFS file, create a readable external HAWQ table referencing the HDFS file:
    +
    +    ``` sql
    +    gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple_r1(location text, month text, num_orders int, total_sales float8)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxfwritable_hdfs_textsimple1?PROFILE=HdfsTextSimple')
    +			    FORMAT 'CSV';
    +    ```
    +
    +    The table is created with the `'CSV'` `FORMAT` because the delimiter character used when creating the writable table was a comma `,`.
    +
    +6. Query the readable external table `pxf_hdfs_textsimple_r1`:
    +
    +    ``` sql
    +    gpadmin=# SELECT * FROM pxf_hdfs_textsimple_r1;
    +    ```
    +
    +    ``` pre
    +     location  | month | num_orders | total_sales
    +    -----------+-------+------------+-------------
    +      Frankfurt | Mar   |        777 |     3956.98
    +      Cleveland | Oct   |       3812 |    96645.37
    +      Prague    | Jan   |        101 |     4875.33
    +      Rome      | Mar   |         87 |     1557.39
    +      Bangalore | May   |        317 |     8936.99
    +      Beijing   | Jul   |        411 |    11600.67
    +    (6 rows)
    +    ```
    +
    +    The table includes the records you individually inserted, as well as the full contents of the `pxf_hdfs_textsimple` table.
    +
    +7. Create a second HAWQ external writable table, this time using Gzip compression and employing a colon `:` as the delimiter:
    +
    +    ``` sql
    +    gpadmin=# CREATE WRITABLE EXTERNAL TABLE pxf_hdfs_writabletbl_2 (location text, month text, num_orders int, total_sales float8)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxfwritable_hdfs_textsimple2?PROFILE=HdfsTextSimple&COMPRESSION_CODEC=org.apache.hadoop.io.compress.GzipCodec')
    +              FORMAT 'TEXT' (delimiter=E':');
    +    ```
    +
    +8. Write a few records to the `pxfwritable_hdfs_textsimple2` HDFS file by inserting into the `pxf_hdfs_writabletbl_2` table:
    +
    +    ``` sql
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_2 VALUES ( 'Frankfurt', 'Mar', 777, 3956.98 );
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_2 VALUES ( 'Cleveland', 'Oct', 3812, 96645.37 );
    +    ```
    +
    +9. View the file contents in HDFS; use the `-text` option to `hdfs dfs` to view the compressed data as text:
    +
    +    ``` shell
    +    $ hdfs dfs -text /data/pxf_examples/pxfwritable_hdfs_textsimple2/*
    +    Frankfurt:Mar:777:3956.98
    +    Cleveland:Oct:3812:96645.3
    +    ```
    +
    +    Notice that the colon `:` is the field separator in the HDFS file.
    +
    +    As described in Step 5 above, to query data from the newly-created HDFS file named `pxfwritable_hdfs_textsimple2`, you can create a readable external HAWQ table referencing this HDFS file.
    +
    +
    +## <a id="profile_hdfsseqwritable_using"></a>SequenceWritable Profile
    +
    +Use the HDFS plug-in `SequenceWritable` profile when writing SequenceFile format files. Files of this type consist of binary key/value pairs. Sequence files are a common data transfer format between MapReduce jobs. 
    +
    +SequenceFile format files can optionally use record or block compression. The following compression codecs are supported:
    +
    +- org.apache.hadoop.io.compress.DefaultCodec
    +- org.apache.hadoop.io.compress.BZip2Codec
    +
    +When using the `SequenceWritable` profile to write a SequenceFile format file, you must provide the name of the Java class used to serialize/deserialize the data. This class will provide custom read and write methods for the fields in the schema associated with the data you will be writing.
    +
    +### <a id="profile_hdfsseqwritable_writing"></a>Example: Writing Using the SequenceWritable Profile
    +
    +Before creating a writable external table with the `SequenceWritable` profile, you will first create a Java class named `PxfExample_CustomWritable` that will serialize/deserialize the fields in the sample schema used in previous examples.
    --- End diff --
    
    Edit:  In this example you will create a Java class named PxfExample_CustomWritable that will serialize/deserialize the fields in the sample schema used in previous examples. You will then use this class to access a writable external table that uses the SequenceWritable profile, 
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by dyozie <gi...@git.apache.org>.
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r86051029
  
    --- Diff: pxf/HDFSWritablePXF.html.md.erb ---
    @@ -0,0 +1,416 @@
    +---
    +title: Writing Data to HDFS
    +---
    +
    +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable table to export data from a HAWQ internal table to binary or text HDFS files.
    +
    +Use the `HdfsTextSimple` profile when writing text data. Use the `SequenceWritable` profile when dealing with binary data.
    +
    +This section describes how to use these PXF profiles to create writable external tables.
    +
    +**Note**: Tables that you create with writable profiles can only be used for INSERT operations.  If you want to query inserted data, you must define a separate external readable table that references the new HDFS file using the equivalent readable profile.  ??You can also create a Hive table to access the HDFS file.??
    +
    +## <a id="pxfwrite_prereq"></a>Prerequisites
    +
    +Before working with HDFS file data using HAWQ and PXF, ensure that:
    +
    +-   The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
    +-   All HDFS users have read permissions to HDFS services.
    +-   HDFS write permissions are provided to a restricted set of users.
    +
    +## <a id="hdfsplugin_writeextdata"></a>Writing to PXF External Tables
    +The PXF HDFS plug-in supports two writable profiles: `HdfsTextSimple` and `SequenceWritable`.
    +
    +Use the following syntax to create a HAWQ external writable table representing HDFS data:�
    +
    +``` sql
    +CREATE WRITABLE EXTERNAL TABLE <table_name> 
    +    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
    +LOCATION ('pxf://<host>[:<port>]/<path-to-hdfs-file>
    +    ?PROFILE=HdfsTextSimple|SequenceWritable[&<custom-option>=<value>[...]]')
    +FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
    +```
    +
    +HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the table below.
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| \<host\>[:\<port\>]    | The HDFS NameNode and port. |
    +| \<path-to-hdfs-file\>    | The path to the file in the HDFS data store. |
    +| PROFILE    | The `PROFILE` keyword must specify one of the values `HdfsTextSimple` or `SequenceWritable`. |
    +| \<custom-option\>  | \<custom-option\> is profile-specific. These options are discussed in the next topic.|
    +| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile to create a plain-text-delimited file at the location specified by \<path-to-hdfs-file\>. The `HdfsTextSimple` '`TEXT`' `FORMAT` supports only the built-in `(delimiter=<delim>)` \<formatting-property\>. |
    +| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with the `HdfsTextSimple` profile to create a comma-separated-value file at the location specified by \<path-to-hdfs-file\>.  |
    +| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the `SequenceWritable` profile. The `SequenceWritable` '`CUSTOM`' `FORMAT` supports only the built-in `(formatter='pxfwritable_export)` (write) and `(formatter='pxfwritable_import)` (read) \<formatting-properties\>.
    +
    +**Note**: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification.
    +
    +## <a id="profile_hdfstextsimple"></a>Custom Options
    +
    +The `HdfsTextSimple` and `SequenceWritable` profiles support the following custom options:
    +
    +| Option  | Value Description | Profile |
    +|-------|-------------------------------------|--------|
    +| COMPRESSION_CODEC    | The compression codec Java class name. If this option is not provided, no data compression is performed. Supported compression codecs include: `org.apache.hadoop.io.compress.DefaultCodec` and `org.apache.hadoop.io.compress.BZip2Codec` | HdfsTextSimple, SequenceWritable |
    +|    |  `org.apache.hadoop.io.compress.GzipCodec` | HdfsTextSimple |
    +| COMPRESSION_TYPE    | The compression type to employ; supported values are `RECORD` (the default) or `BLOCK`. | HdfsTextSimple, SequenceWritable |
    +| DATA-SCHEMA    | The name of the writer serialization/deserialization class. The jar file in which this class resides must be in the PXF class path. This option is required for the `SequenceWritable` profile and has no default value. | SequenceWritable|
    +| THREAD-SAFE | Boolean value determining if a table query can run in multi-threaded mode. The default value is `TRUE`. Set this option to `FALSE` to handle all requests in a single thread for operations that are not thread-safe (for example, compression). | HdfsTextSimple, SequenceWritable| 
    +
    +## <a id="profile_hdfstextsimple"></a>HdfsTextSimple Profile
    +
    +Use the `HdfsTextSimple` profile when writing delimited data to a plain text file where each row is a single record.
    +
    +Writable tables created using the `HdfsTextSimple` profile can optionally use record or block compression. The following compression codecs are supported:
    +
    +- org.apache.hadoop.io.compress.DefaultCodec
    +- org.apache.hadoop.io.compress.GzipCodec
    +- org.apache.hadoop.io.compress.BZip2Codec
    +
    +The `HdfsTextSimple` profile supports the following \<formatting-properties\>:
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| delimiter    | The delimiter character to use when writing the file. Default value is a comma `,`.|
    +
    +
    +### <a id="profile_hdfstextsimple_writing"></a>Example: Writing Data Using the HdfsTextSimple Profile
    +
    +This example uses the data schema introduced in [Example: Using the HdfsTextSimple Profile] (HDFSFileDataPXF.html#profile_hdfstextsimple_query):
    +
    +
    +| Field Name  | Data Type |
    +|-------|-------------------------------------|
    +| location | text |
    +| month | text |
    +| number\_of\_orders | int |
    +| total\_sales | float8 |
    +
    +The example also uses the HAWQ table `pxf_hdfs_textsimple` created in that exercise and expects it to exist.
    +
    +
    +Perform the following operations to use the PXF `HdfsTextSimple` profile to create a HAWQ writable external table with the same data schema as defined above. You will also create a separate external readable table to read the associated HDFS file.
    +
    +1. Create a writable HAWQ external table with the data schema described above. Write to the HDFS file `/data/pxf_examples/pxfwritable_hdfs_textsimple1`. Create the table specifying a comma `,` as the delimiter:
    +
    +    ``` sql
    +    gpadmin=# CREATE WRITABLE EXTERNAL TABLE pxf_hdfs_writabletbl_1(location text, month text, num_orders int, total_sales float8)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxfwritable_hdfs_textsimple1?PROFILE=HdfsTextSimple')
    +              FORMAT 'TEXT' (delimiter=E',');
    +    ```
    +    
    +    The `FORMAT` subclause `delimiter` value is specified as the single ascii comma character `,`. `E` escapes the character.
    +
    +2. Write a few records to the `pxfwritable_hdfs_textsimple1` HDFS file by invoking the SQL `INSERT` command on `pxf_hdfs_writabletbl_1`:
    +
    +    ``` sql
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_1 VALUES ( 'Frankfurt', 'Mar', 777, 3956.98 );
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_1 VALUES ( 'Cleveland', 'Oct', 3812, 96645.37 );
    +    ```
    +
    +3. Insert the contents of the `pxf_hdfs_textsimple` table created in [Example: Using the HdfsTextSimple Profile] (HDFSFileDataPXF.html#profile_hdfstextsimple_query) into `pxf_hdfs_writabletbl_1`:
    +
    +    ``` sql
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_1 SELECT * FROM pxf_hdfs_textsimple;
    +    ```
    +
    +4. View the file contents in HDFS:
    +
    +    ``` shell
    +    $ hdfs dfs -cat /data/pxf_examples/pxfwritable_hdfs_textsimple1/*
    +    Frankfurt,Mar,777,3956.98
    +    Cleveland,Oct,3812,96645.37
    +    Prague,Jan,101,4875.33
    +    Rome,Mar,87,1557.39
    +    Bangalore,May,317,8936.99
    +    Beijing,Jul,411,11600.67
    +    ```
    +
    +    Because you specified comma `,` as the delimiter, this character is the field separator used in each record of the HDFS file.
    +
    +5. Querying an external writable table is not supported in HAWQ. To query data from the newly-created HDFS file, create a readable external HAWQ table referencing the HDFS file:
    +
    +    ``` sql
    +    gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple_r1(location text, month text, num_orders int, total_sales float8)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxfwritable_hdfs_textsimple1?PROFILE=HdfsTextSimple')
    +			    FORMAT 'CSV';
    +    ```
    +
    +    The table is created with the `'CSV'` `FORMAT` because the delimiter character used when creating the writable table was a comma `,`.
    +
    +6. Query the readable external table `pxf_hdfs_textsimple_r1`:
    +
    +    ``` sql
    +    gpadmin=# SELECT * FROM pxf_hdfs_textsimple_r1;
    +    ```
    +
    +    ``` pre
    +     location  | month | num_orders | total_sales
    +    -----------+-------+------------+-------------
    +      Frankfurt | Mar   |        777 |     3956.98
    +      Cleveland | Oct   |       3812 |    96645.37
    +      Prague    | Jan   |        101 |     4875.33
    +      Rome      | Mar   |         87 |     1557.39
    +      Bangalore | May   |        317 |     8936.99
    +      Beijing   | Jul   |        411 |    11600.67
    +    (6 rows)
    +    ```
    +
    +    The table includes the records you individually inserted, as well as the full contents of the `pxf_hdfs_textsimple` table.
    +
    +7. Create a second HAWQ external writable table, this time using Gzip compression and employing a colon `:` as the delimiter:
    +
    +    ``` sql
    +    gpadmin=# CREATE WRITABLE EXTERNAL TABLE pxf_hdfs_writabletbl_2 (location text, month text, num_orders int, total_sales float8)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxfwritable_hdfs_textsimple2?PROFILE=HdfsTextSimple&COMPRESSION_CODEC=org.apache.hadoop.io.compress.GzipCodec')
    +              FORMAT 'TEXT' (delimiter=E':');
    +    ```
    +
    +8. Write a few records to the `pxfwritable_hdfs_textsimple2` HDFS file by inserting into the `pxf_hdfs_writabletbl_2` table:
    +
    +    ``` sql
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_2 VALUES ( 'Frankfurt', 'Mar', 777, 3956.98 );
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_2 VALUES ( 'Cleveland', 'Oct', 3812, 96645.37 );
    +    ```
    +
    +9. View the file contents in HDFS; use the `-text` option to `hdfs dfs` to view the compressed data as text:
    +
    +    ``` shell
    +    $ hdfs dfs -text /data/pxf_examples/pxfwritable_hdfs_textsimple2/*
    +    Frankfurt:Mar:777:3956.98
    +    Cleveland:Oct:3812:96645.3
    +    ```
    +
    +    Notice that the colon `:` is the field separator in the HDFS file.
    +
    +    As described in Step 5 above, to query data from the newly-created HDFS file named `pxfwritable_hdfs_textsimple2`, you can create a readable external HAWQ table referencing this HDFS file.
    +
    +
    +## <a id="profile_hdfsseqwritable_using"></a>SequenceWritable Profile
    +
    +Use the HDFS plug-in `SequenceWritable` profile when writing SequenceFile format files. Files of this type consist of binary key/value pairs. Sequence files are a common data transfer format between MapReduce jobs. 
    +
    +SequenceFile format files can optionally use record or block compression. The following compression codecs are supported:
    +
    +- org.apache.hadoop.io.compress.DefaultCodec
    +- org.apache.hadoop.io.compress.BZip2Codec
    +
    +When using the `SequenceWritable` profile to write a SequenceFile format file, you must provide the name of the Java class used to serialize/deserialize the data. This class will provide custom read and write methods for the fields in the schema associated with the data you will be writing.
    +
    +### <a id="profile_hdfsseqwritable_writing"></a>Example: Writing Using the SequenceWritable Profile
    +
    +Before creating a writable external table with the `SequenceWritable` profile, you will first create a Java class named `PxfExample_CustomWritable` that will serialize/deserialize the fields in the sample schema used in previous examples.
    +
    +Perform the following steps to create the Java class and writable table.
    +
    +1. Prepare to create the sample Java class:
    +
    +    ``` shell
    +    $ mkdir -p pxfex/com/hawq/example/pxf/hdfs/writable/dataschema
    +    $ cd pxfex/com/hawq/example/pxf/hdfs/writable/dataschema
    +    $ vi PxfExample_CustomWritable.java
    +    ```
    +
    +2. Copy and paste the following text into the `PxfExample_CustomWritable.java` file:
    +
    +    ``` java
    +    package com.hawq.example.pxf.hdfs.writable.dataschema;
    +    
    +    import org.apache.hadoop.io.*;
    +import java.io.DataInput;
    +import java.io.DataOutput;
    +import java.io.IOException;
    +import java.lang.reflect.Field;
    +
    +    /**
    + * PxfExample_CustomWritable class - used to serialize and deserialize data with
    + * text, int, and float data types
    + */
    +public class PxfExample_CustomWritable implements Writable {
    +
    +    public String st1, st2;
    +    public int int1;
    +    public float ft;
    +
    +    public PxfExample_CustomWritable() {
    +        st1 = new String("");
    +        st2 = new String("");
    +        int1 = 0;
    +        ft = 0.f;
    +    }
    +
    +    public PxfExample_CustomWritable(int i1, int i2, int i3) {
    +
    +        st1 = new String("short_string___" + i1);
    +        st2 = new String("short_string___" + i1);
    +        int1 = i2;
    +        ft = i1 * 10.f * 2.3f;
    +
    +    }
    +
    +    String GetSt1() {
    +        return st1;
    +    }
    +
    +    String GetSt2() {
    +        return st2;
    +    }
    +
    +    int GetInt1() {
    +        return int1;
    +    }
    +    
    +    float GetFt() {
    +        return ft;
    +    }
    +
    +    @Override
    +    public void write(DataOutput out) throws IOException {
    +
    +        Text txt = new Text();
    +        txt.set(st1);
    +        txt.write(out);
    +        txt.set(st2);
    +        txt.write(out);
    +
    +        IntWritable intw = new IntWritable();
    +        intw.set(int1);
    +        intw.write(out);
    +
    +        FloatWritable fw = new FloatWritable();
    +        fw.set(ft);
    +        fw.write(out);
    +    }
    +
    +    @Override
    +    public void readFields(DataInput in) throws IOException {
    +
    +        Text txt = new Text();
    +        txt.readFields(in);
    +        st1 = txt.toString();
    +        txt.readFields(in);
    +        st2 = txt.toString();
    +
    +        IntWritable intw = new IntWritable();
    +        intw.readFields(in);
    +        int1 = intw.get();
    +
    +        FloatWritable fw = new FloatWritable();
    +        fw.readFields(in);
    +        ft = fw.get();
    +    }
    +
    +    public void printFieldTypes() {
    +        Class myClass = this.getClass();
    +        Field[] fields = myClass.getDeclaredFields();
    +
    +        for (int i = 0; i < fields.length; i++) {
    +            System.out.println(fields[i].getType().getName());
    +        }
    +    }
    +}
    +    ```
    +
    +3. Compile and create a Java class jar file for `PxfExample_CustomWritable`:
    +
    +    ``` shell
    +    $ javac -classpath /usr/hdp/2.5.0.0-1245/hadoop/hadoop-common.jar  PxfExample_CustomWritable.java
    +    $ cd ../../../../../../../
    +    $ jar cf pxfex-customwritable.jar com
    +    $ cp pxfex-customwritable.jar /tmp/
    +    ```
    +
    +4. Include the new jar file in the PXF Agent class path by updating the `pxf-public.classpath`. If you use Ambari to manage your cluster, add the following line via the Ambari UI and restart the PXF Agent:
    +
    +    ``` pre
    +    /tmp/pxfex-customwritable.jar
    +    ```
    +
    +    If you have a command-line-managed HAWQ cluster, perform the following steps on each node in your HAWQ cluster:
    +   - Directly edit `/etc/pxf/conf/pxf-public.classpath` and add the line above.
    +   - Restart the PXF Agent:
    +
    +        ``` shell
    +        $ sudo service pxf-service restart
    +        ```
    +
    +5. Use the PXF `SequenceWritable` profile to create a writable HAWQ external table. Identify the serialization/deserialization Java class you created above in the `DATA-SCHEMA` \<custom-option\>. Use `BLOCK` mode compression with `BZip2` when creating the writable table.
    +
    +    ``` sql
    +    gpadmin=# CREATE WRITABLE EXTERNAL TABLE pxf_tbl_seqwrit (location text, month text, number_of_orders integer, total_sales real)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_seqwrit_file?PROFILE=SequenceWritable&DATA-SCHEMA=com.hawq.example.pxf.hdfs.writable.dataschema.PxfExample_CustomWritable&COMPRESSION_TYPE=BLOCK&COMPRESSION_CODEC=org.apache.hadoop.io.compress.BZip2Codec')
    +              FORMAT 'CUSTOM' (formatter='pxfwritable_export');
    +    ```
    +
    +    Notice that the `'CUSTOM'` `FORMAT` \<formatting-properties\> specify the built-in `pxfwritable_export` formatter.
    +
    +6. Insert some data into `pxf_tbl_seqwrit`:
    +
    +    ``` sql
    +    gpadmin=# INSERT INTO pxf_tbl_seqwrit VALUES ( 'Frankfurt', 'Mar', 777, 3956.98 );
    +    gpadmin=# INSERT INTO pxf_tbl_seqwrit VALUES ( 'Cleveland', 'Oct', 3812, 96645.37 );
    +    ```
    +
    +6. Recall that querying an external writable table is not supported in HAWQ. To read the newly-created writable table, create a HAWQ readable external table referencing the writable table's HDFS file:
    +
    +    ``` sql
    +    gpadmin=# CREATE EXTERNAL TABLE read_pxf_tbl_seqwrit (location text, month text, number_of_orders integer, total_sales real)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_seqwrit_file?PROFILE=SequenceWritable&DATA-SCHEMA=com.hawq.example.pxf.hdfs.writable.dataschema.PxfExample_CustomWritable')
    +              FORMAT 'CUSTOM' (formatter='pxfwritable_import');
    +    ```
    +    The `DATA-SCHEMA` \<custom-option\> must be specified when reading an HDFS file via the `SequenceWritable` profile. Compression-related options need not be provided.
    +    
    +6. Query the readable external table `read_pxf_tbl_seqwrit`:
    +
    +    ``` sql
    +    gpadmin=# SELECT * FROM read_pxf_tbl_seqwrit;
    +    ```
    +    
    +    ``` pre
    +     location  | month | number_of_orders | total_sales 
    +    -----------+-------+------------------+-------------
    +     Frankfurt | Mar   |              777 |     3956.98
    +     Cleveland | Oct   |             3812 |     96645.4
    +    (2 rows)
    +    ```
    +
    +7. To read the `pxf_seqwrit_file` HDFS file via Hive:
    +
    +    IS THIS A VALID USE CASE???  COULD NOT GET THIS TO WORK.
    +
    +## <a id="recordkeyinkey-valuefileformats"></a>Reading the Record Key 
    +
    +Sequence file and other file formats that store rows in a key-value format can access the key value through HAWQ by using the `recordkey` keyword as a field name.
    --- End diff --
    
    This needs some edits:
    
    When a HAWQ external table references a SequenceFile file or another file format that stores rows in a key-value format, you can access the key values in HAWQ queries by using the `recordkey` keyword as a field name.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by dyozie <gi...@git.apache.org>.
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r85812589
  
    --- Diff: pxf/HDFSWritablePXF.html.md.erb ---
    @@ -0,0 +1,410 @@
    +---
    +title: Writing Data to HDFS
    +---
    +
    +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable table to export data from a HAWQ internal table to HDFS.
    +
    +This section describes how to use these PXF profiles to create writable external tables.
    +
    +**Note**: You cannot directly query data in a HAWQ writable table.  After creating the external writable table, you must create a HAWQ readable external table accessing the HDFS file, then query that table. ??You can also create a Hive table to access the HDFS file.??
    +
    +## <a id="pxfwrite_prereq"></a>Prerequisites
    +
    +Before working with HDFS file data using HAWQ and PXF, ensure that:
    +
    +-   The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
    +-   All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users.
    +
    +## <a id="hdfsplugin_writeextdata"></a>Writing to PXF External Tables
    +The PXF HDFS plug-in supports writable two profiles: `HdfsTextSimple` and `SequenceWritable`.
    +
    +Use the following syntax to create a HAWQ external writable table representing HDFS data:�
    +
    +``` sql
    +CREATE EXTERNAL WRITABLE TABLE <table_name> 
    +    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
    +LOCATION ('pxf://<host>[:<port>]/<path-to-hdfs-file>
    +    ?PROFILE=HdfsTextSimple|SequenceWritable[&<custom-option>=<value>[...]]')
    +FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
    +```
    +
    +HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the table below.
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| \<host\>[:\<port\>]    | The HDFS NameNode and port. |
    +| \<path-to-hdfs-file\>    | The path to the file in the HDFS data store. |
    +| PROFILE    | The `PROFILE` keyword must specify one of the values `HdfsTextSimple` or `SequenceWritable`. |
    +| \<custom-option\>  | \<custom-option\> is profile-specific. These options are discussed in the next topic.|
    +| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile when \<path-to-hdfs-file\> will reference a plain text delimited file. The `HdfsTextSimple` '`TEXT`' `FORMAT` supports only the built-in `(delimiter=<delim>)` \<formatting-property\>. |
    +| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with `HdfsTextSimple` when \<path-to-hdfs-file\> will reference a comma-separated value file.  |
    +| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the `SequenceWritable` profile. The `SequenceWritable` '`CUSTOM`' `FORMAT` supports only the built-in `(formatter='pxfwritable_export)` (write) and `(formatter='pxfwritable_import)` (read) \<formatting-properties\>.
    +
    +**Note**: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification.
    +
    +## <a id="profile_hdfstextsimple"></a>Custom Options
    +
    +The `HdfsTextSimple` and `SequenceWritable` profiles support the following \<custom-options\>:
    +
    +| Keyword  | Value Description |
    +|-------|-------------------------------------|
    +| COMPRESSION_CODEC    | The compression codec Java class name. If this option is not provided, no data compression is performed. Supported compression codecs include: `org.apache.hadoop.io.compress.DefaultCodec`, `org.apache.hadoop.io.compress.BZip2Codec`, and `org.apache.hadoop.io.compress.GzipCodec` (`HdfsTextSimple` profile only) |
    +| COMPRESSION_TYPE    | The compression type to employ; supported values are `RECORD` (the default) or `BLOCK`. |
    +| DATA-SCHEMA    | (`SequenceWritable` profile only) The name of the writer serialization/deserialization class. The jar file in which this class resides must be in the PXF class path. This option has no default value. |
    +| THREAD-SAFE | Boolean value determining if a table query can run in multi-thread mode. Default value is `TRUE`, requests run in multi-threaded mode. When set to `FALSE`, requests will be handled in a single thread.  `THREAD-SAFE` should be set appropriately when operations that are not thread-safe are performed (i.e. compression). |
    --- End diff --
    
    multi-thread -> multi-threaded.  Also, the rest  some edits:
    
    The default value is true. Set this option to `FALSE` to handle all requests in a single thread for operations that are not thread-safe (for example, compression).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by lisakowen <gi...@git.apache.org>.
Github user lisakowen closed the pull request at:

    https://github.com/apache/incubator-hawq-docs/pull/46


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by dyozie <gi...@git.apache.org>.
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r86050016
  
    --- Diff: pxf/HDFSWritablePXF.html.md.erb ---
    @@ -0,0 +1,416 @@
    +---
    +title: Writing Data to HDFS
    +---
    +
    +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable table to export data from a HAWQ internal table to binary or text HDFS files.
    +
    +Use the `HdfsTextSimple` profile when writing text data. Use the `SequenceWritable` profile when dealing with binary data.
    +
    +This section describes how to use these PXF profiles to create writable external tables.
    +
    +**Note**: Tables that you create with writable profiles can only be used for INSERT operations.  If you want to query inserted data, you must define a separate external readable table that references the new HDFS file using the equivalent readable profile.  ??You can also create a Hive table to access the HDFS file.??
    +
    +## <a id="pxfwrite_prereq"></a>Prerequisites
    +
    +Before working with HDFS file data using HAWQ and PXF, ensure that:
    +
    +-   The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
    +-   All HDFS users have read permissions to HDFS services.
    +-   HDFS write permissions are provided to a restricted set of users.
    +
    +## <a id="hdfsplugin_writeextdata"></a>Writing to PXF External Tables
    +The PXF HDFS plug-in supports two writable profiles: `HdfsTextSimple` and `SequenceWritable`.
    +
    +Use the following syntax to create a HAWQ external writable table representing HDFS data:�
    +
    +``` sql
    +CREATE WRITABLE EXTERNAL TABLE <table_name> 
    +    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
    +LOCATION ('pxf://<host>[:<port>]/<path-to-hdfs-file>
    +    ?PROFILE=HdfsTextSimple|SequenceWritable[&<custom-option>=<value>[...]]')
    +FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
    +```
    +
    +HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the table below.
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| \<host\>[:\<port\>]    | The HDFS NameNode and port. |
    +| \<path-to-hdfs-file\>    | The path to the file in the HDFS data store. |
    +| PROFILE    | The `PROFILE` keyword must specify one of the values `HdfsTextSimple` or `SequenceWritable`. |
    +| \<custom-option\>  | \<custom-option\> is profile-specific. These options are discussed in the next topic.|
    +| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile to create a plain-text-delimited file at the location specified by \<path-to-hdfs-file\>. The `HdfsTextSimple` '`TEXT`' `FORMAT` supports only the built-in `(delimiter=<delim>)` \<formatting-property\>. |
    +| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with the `HdfsTextSimple` profile to create a comma-separated-value file at the location specified by \<path-to-hdfs-file\>.  |
    +| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the `SequenceWritable` profile. The `SequenceWritable` '`CUSTOM`' `FORMAT` supports only the built-in `(formatter='pxfwritable_export)` (write) and `(formatter='pxfwritable_import)` (read) \<formatting-properties\>.
    +
    +**Note**: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification.
    +
    +## <a id="profile_hdfstextsimple"></a>Custom Options
    +
    +The `HdfsTextSimple` and `SequenceWritable` profiles support the following custom options:
    +
    +| Option  | Value Description | Profile |
    +|-------|-------------------------------------|--------|
    +| COMPRESSION_CODEC    | The compression codec Java class name. If this option is not provided, no data compression is performed. Supported compression codecs include: `org.apache.hadoop.io.compress.DefaultCodec` and `org.apache.hadoop.io.compress.BZip2Codec` | HdfsTextSimple, SequenceWritable |
    +|    |  `org.apache.hadoop.io.compress.GzipCodec` | HdfsTextSimple |
    +| COMPRESSION_TYPE    | The compression type to employ; supported values are `RECORD` (the default) or `BLOCK`. | HdfsTextSimple, SequenceWritable |
    +| DATA-SCHEMA    | The name of the writer serialization/deserialization class. The jar file in which this class resides must be in the PXF class path. This option is required for the `SequenceWritable` profile and has no default value. | SequenceWritable|
    +| THREAD-SAFE | Boolean value determining if a table query can run in multi-threaded mode. The default value is `TRUE`. Set this option to `FALSE` to handle all requests in a single thread for operations that are not thread-safe (for example, compression). | HdfsTextSimple, SequenceWritable| 
    +
    +## <a id="profile_hdfstextsimple"></a>HdfsTextSimple Profile
    +
    +Use the `HdfsTextSimple` profile when writing delimited data to a plain text file where each row is a single record.
    +
    +Writable tables created using the `HdfsTextSimple` profile can optionally use record or block compression. The following compression codecs are supported:
    +
    +- org.apache.hadoop.io.compress.DefaultCodec
    +- org.apache.hadoop.io.compress.GzipCodec
    +- org.apache.hadoop.io.compress.BZip2Codec
    +
    +The `HdfsTextSimple` profile supports the following \<formatting-properties\>:
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| delimiter    | The delimiter character to use when writing the file. Default value is a comma `,`.|
    +
    +
    +### <a id="profile_hdfstextsimple_writing"></a>Example: Writing Data Using the HdfsTextSimple Profile
    +
    +This example uses the data schema introduced in [Example: Using the HdfsTextSimple Profile] (HDFSFileDataPXF.html#profile_hdfstextsimple_query):
    +
    +
    +| Field Name  | Data Type |
    +|-------|-------------------------------------|
    +| location | text |
    +| month | text |
    +| number\_of\_orders | int |
    +| total\_sales | float8 |
    +
    +The example also uses the HAWQ table `pxf_hdfs_textsimple` created in that exercise and expects it to exist.
    +
    +
    +Perform the following operations to use the PXF `HdfsTextSimple` profile to create a HAWQ writable external table with the same data schema as defined above. You will also create a separate external readable table to read the associated HDFS file.
    +
    +1. Create a writable HAWQ external table with the data schema described above. Write to the HDFS file `/data/pxf_examples/pxfwritable_hdfs_textsimple1`. Create the table specifying a comma `,` as the delimiter:
    +
    +    ``` sql
    +    gpadmin=# CREATE WRITABLE EXTERNAL TABLE pxf_hdfs_writabletbl_1(location text, month text, num_orders int, total_sales float8)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxfwritable_hdfs_textsimple1?PROFILE=HdfsTextSimple')
    +              FORMAT 'TEXT' (delimiter=E',');
    +    ```
    +    
    +    The `FORMAT` subclause `delimiter` value is specified as the single ascii comma character `,`. `E` escapes the character.
    +
    +2. Write a few records to the `pxfwritable_hdfs_textsimple1` HDFS file by invoking the SQL `INSERT` command on `pxf_hdfs_writabletbl_1`:
    +
    +    ``` sql
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_1 VALUES ( 'Frankfurt', 'Mar', 777, 3956.98 );
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_1 VALUES ( 'Cleveland', 'Oct', 3812, 96645.37 );
    +    ```
    +
    +3. Insert the contents of the `pxf_hdfs_textsimple` table created in [Example: Using the HdfsTextSimple Profile] (HDFSFileDataPXF.html#profile_hdfstextsimple_query) into `pxf_hdfs_writabletbl_1`:
    +
    +    ``` sql
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_1 SELECT * FROM pxf_hdfs_textsimple;
    +    ```
    +
    +4. View the file contents in HDFS:
    +
    +    ``` shell
    +    $ hdfs dfs -cat /data/pxf_examples/pxfwritable_hdfs_textsimple1/*
    +    Frankfurt,Mar,777,3956.98
    +    Cleveland,Oct,3812,96645.37
    +    Prague,Jan,101,4875.33
    +    Rome,Mar,87,1557.39
    +    Bangalore,May,317,8936.99
    +    Beijing,Jul,411,11600.67
    +    ```
    +
    +    Because you specified comma `,` as the delimiter, this character is the field separator used in each record of the HDFS file.
    +
    +5. Querying an external writable table is not supported in HAWQ. To query data from the newly-created HDFS file, create a readable external HAWQ table referencing the HDFS file:
    +
    +    ``` sql
    +    gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple_r1(location text, month text, num_orders int, total_sales float8)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxfwritable_hdfs_textsimple1?PROFILE=HdfsTextSimple')
    +			    FORMAT 'CSV';
    +    ```
    +
    +    The table is created with the `'CSV'` `FORMAT` because the delimiter character used when creating the writable table was a comma `,`.
    +
    +6. Query the readable external table `pxf_hdfs_textsimple_r1`:
    +
    +    ``` sql
    +    gpadmin=# SELECT * FROM pxf_hdfs_textsimple_r1;
    +    ```
    +
    +    ``` pre
    +     location  | month | num_orders | total_sales
    +    -----------+-------+------------+-------------
    +      Frankfurt | Mar   |        777 |     3956.98
    +      Cleveland | Oct   |       3812 |    96645.37
    +      Prague    | Jan   |        101 |     4875.33
    +      Rome      | Mar   |         87 |     1557.39
    +      Bangalore | May   |        317 |     8936.99
    +      Beijing   | Jul   |        411 |    11600.67
    +    (6 rows)
    +    ```
    +
    +    The table includes the records you individually inserted, as well as the full contents of the `pxf_hdfs_textsimple` table.
    +
    +7. Create a second HAWQ external writable table, this time using Gzip compression and employing a colon `:` as the delimiter:
    +
    +    ``` sql
    +    gpadmin=# CREATE WRITABLE EXTERNAL TABLE pxf_hdfs_writabletbl_2 (location text, month text, num_orders int, total_sales float8)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxfwritable_hdfs_textsimple2?PROFILE=HdfsTextSimple&COMPRESSION_CODEC=org.apache.hadoop.io.compress.GzipCodec')
    +              FORMAT 'TEXT' (delimiter=E':');
    +    ```
    +
    +8. Write a few records to the `pxfwritable_hdfs_textsimple2` HDFS file by inserting into the `pxf_hdfs_writabletbl_2` table:
    +
    +    ``` sql
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_2 VALUES ( 'Frankfurt', 'Mar', 777, 3956.98 );
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_2 VALUES ( 'Cleveland', 'Oct', 3812, 96645.37 );
    +    ```
    +
    +9. View the file contents in HDFS; use the `-text` option to `hdfs dfs` to view the compressed data as text:
    +
    +    ``` shell
    +    $ hdfs dfs -text /data/pxf_examples/pxfwritable_hdfs_textsimple2/*
    +    Frankfurt:Mar:777:3956.98
    +    Cleveland:Oct:3812:96645.3
    +    ```
    +
    +    Notice that the colon `:` is the field separator in the HDFS file.
    +
    +    As described in Step 5 above, to query data from the newly-created HDFS file named `pxfwritable_hdfs_textsimple2`, you can create a readable external HAWQ table referencing this HDFS file.
    +
    +
    +## <a id="profile_hdfsseqwritable_using"></a>SequenceWritable Profile
    +
    +Use the HDFS plug-in `SequenceWritable` profile when writing SequenceFile format files. Files of this type consist of binary key/value pairs. Sequence files are a common data transfer format between MapReduce jobs. 
    +
    +SequenceFile format files can optionally use record or block compression. The following compression codecs are supported:
    +
    +- org.apache.hadoop.io.compress.DefaultCodec
    +- org.apache.hadoop.io.compress.BZip2Codec
    +
    +When using the `SequenceWritable` profile to write a SequenceFile format file, you must provide the name of the Java class used to serialize/deserialize the data. This class will provide custom read and write methods for the fields in the schema associated with the data you will be writing.
    +
    +### <a id="profile_hdfsseqwritable_writing"></a>Example: Writing Using the SequenceWritable Profile
    +
    +Before creating a writable external table with the `SequenceWritable` profile, you will first create a Java class named `PxfExample_CustomWritable` that will serialize/deserialize the fields in the sample schema used in previous examples.
    +
    +Perform the following steps to create the Java class and writable table.
    +
    +1. Prepare to create the sample Java class:
    +
    +    ``` shell
    +    $ mkdir -p pxfex/com/hawq/example/pxf/hdfs/writable/dataschema
    +    $ cd pxfex/com/hawq/example/pxf/hdfs/writable/dataschema
    +    $ vi PxfExample_CustomWritable.java
    +    ```
    +
    +2. Copy and paste the following text into the `PxfExample_CustomWritable.java` file:
    +
    +    ``` java
    +    package com.hawq.example.pxf.hdfs.writable.dataschema;
    +    
    +    import org.apache.hadoop.io.*;
    +import java.io.DataInput;
    +import java.io.DataOutput;
    +import java.io.IOException;
    +import java.lang.reflect.Field;
    +
    +    /**
    + * PxfExample_CustomWritable class - used to serialize and deserialize data with
    + * text, int, and float data types
    + */
    +public class PxfExample_CustomWritable implements Writable {
    +
    +    public String st1, st2;
    +    public int int1;
    +    public float ft;
    +
    +    public PxfExample_CustomWritable() {
    +        st1 = new String("");
    +        st2 = new String("");
    +        int1 = 0;
    +        ft = 0.f;
    +    }
    +
    +    public PxfExample_CustomWritable(int i1, int i2, int i3) {
    +
    +        st1 = new String("short_string___" + i1);
    +        st2 = new String("short_string___" + i1);
    +        int1 = i2;
    +        ft = i1 * 10.f * 2.3f;
    +
    +    }
    +
    +    String GetSt1() {
    +        return st1;
    +    }
    +
    +    String GetSt2() {
    +        return st2;
    +    }
    +
    +    int GetInt1() {
    +        return int1;
    +    }
    +    
    +    float GetFt() {
    +        return ft;
    +    }
    +
    +    @Override
    +    public void write(DataOutput out) throws IOException {
    +
    +        Text txt = new Text();
    +        txt.set(st1);
    +        txt.write(out);
    +        txt.set(st2);
    +        txt.write(out);
    +
    +        IntWritable intw = new IntWritable();
    +        intw.set(int1);
    +        intw.write(out);
    +
    +        FloatWritable fw = new FloatWritable();
    +        fw.set(ft);
    +        fw.write(out);
    +    }
    +
    +    @Override
    +    public void readFields(DataInput in) throws IOException {
    +
    +        Text txt = new Text();
    +        txt.readFields(in);
    +        st1 = txt.toString();
    +        txt.readFields(in);
    +        st2 = txt.toString();
    +
    +        IntWritable intw = new IntWritable();
    +        intw.readFields(in);
    +        int1 = intw.get();
    +
    +        FloatWritable fw = new FloatWritable();
    +        fw.readFields(in);
    +        ft = fw.get();
    +    }
    +
    +    public void printFieldTypes() {
    +        Class myClass = this.getClass();
    +        Field[] fields = myClass.getDeclaredFields();
    +
    +        for (int i = 0; i < fields.length; i++) {
    +            System.out.println(fields[i].getType().getName());
    +        }
    +    }
    +}
    +    ```
    +
    +3. Compile and create a Java class jar file for `PxfExample_CustomWritable`:
    --- End diff --
    
    jar -> JAR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by dyozie <gi...@git.apache.org>.
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r96702964
  
    --- Diff: markdown/pxf/HDFSWritablePXF.html.md.erb ---
    @@ -413,4 +411,4 @@ gpadmin=# SELECT * FROM read_pxf_tbl_seqwrit_RECKEY;
     (2 rows)
     ```
     
    -The `recordkey` is displayed as `0`; a record key was not identified when you inserted entries in the writable table.
    +The `recordkey` is displayed as `0`; you did not identify a record key when you inserted entries in the writable table.
    --- End diff --
    
    Edit:  The `recordkey` value is `0` because you did not identify a record key when you inserted entries into the writable table.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by dyozie <gi...@git.apache.org>.
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r96702781
  
    --- Diff: markdown/pxf/HDFSWritablePXF.html.md.erb ---
    @@ -396,7 +394,7 @@ The field type of `recordkey` must correspond to the key type, much as the other
     
     ### <a id="example1"></a>Example: Using Record Keys
     
    -Create an external readable table to access the record keys from the writable table `pxf_tbl_seqwrit` that you created in [Example: Writing Data Using the SequenceWritable Profile](#profile_hdfsseqwritable_writing).  The `recordkey` is of type `int8`. (We did not identify a record key when creating the writable table.
    +Create an external readable table to access the record keys from the writable table `pxf_tbl_seqwrit` that you created in [Example: Writing Data Using the SequenceWritable Profile](#profile_hdfsseqwritable_writing).  The `recordkey` is of type `int8`. (You did not identify a record key when you created the writable table.
    --- End diff --
    
    Edit last sentence:  The `recordkey` is of type `int8` because you did not identify a record key when you created the writable table.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by lisakowen <gi...@git.apache.org>.
Github user lisakowen commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r86042160
  
    --- Diff: pxf/HDFSWritablePXF.html.md.erb ---
    @@ -0,0 +1,410 @@
    +---
    +title: Writing Data to HDFS
    +---
    +
    +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable table to export data from a HAWQ internal table to HDFS.
    +
    +This section describes how to use these PXF profiles to create writable external tables.
    +
    +**Note**: You cannot directly query data in a HAWQ writable table.  After creating the external writable table, you must create a HAWQ readable external table accessing the HDFS file, then query that table. ??You can also create a Hive table to access the HDFS file.??
    +
    +## <a id="pxfwrite_prereq"></a>Prerequisites
    +
    +Before working with HDFS file data using HAWQ and PXF, ensure that:
    +
    +-   The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
    +-   All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users.
    +
    +## <a id="hdfsplugin_writeextdata"></a>Writing to PXF External Tables
    +The PXF HDFS plug-in supports writable two profiles: `HdfsTextSimple` and `SequenceWritable`.
    +
    +Use the following syntax to create a HAWQ external writable table representing HDFS data:�
    +
    +``` sql
    +CREATE EXTERNAL WRITABLE TABLE <table_name> 
    +    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
    +LOCATION ('pxf://<host>[:<port>]/<path-to-hdfs-file>
    +    ?PROFILE=HdfsTextSimple|SequenceWritable[&<custom-option>=<value>[...]]')
    +FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
    +```
    +
    +HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the table below.
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| \<host\>[:\<port\>]    | The HDFS NameNode and port. |
    +| \<path-to-hdfs-file\>    | The path to the file in the HDFS data store. |
    +| PROFILE    | The `PROFILE` keyword must specify one of the values `HdfsTextSimple` or `SequenceWritable`. |
    +| \<custom-option\>  | \<custom-option\> is profile-specific. These options are discussed in the next topic.|
    --- End diff --
    
    profile-options is a better name, i agree.  all of the plug-in pages are currently using custom-options.  i will make the change to profile-options for all plug-ins in a separate PR after this is accepted.  sound ok?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by dyozie <gi...@git.apache.org>.
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r85792301
  
    --- Diff: pxf/HDFSWritablePXF.html.md.erb ---
    @@ -0,0 +1,410 @@
    +---
    +title: Writing Data to HDFS
    +---
    +
    +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable table to export data from a HAWQ internal table to HDFS.
    +
    +This section describes how to use these PXF profiles to create writable external tables.
    +
    +**Note**: You cannot directly query data in a HAWQ writable table.  After creating the external writable table, you must create a HAWQ readable external table accessing the HDFS file, then query that table. ??You can also create a Hive table to access the HDFS file.??
    +
    +## <a id="pxfwrite_prereq"></a>Prerequisites
    +
    +Before working with HDFS file data using HAWQ and PXF, ensure that:
    +
    +-   The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
    +-   All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users.
    +
    +## <a id="hdfsplugin_writeextdata"></a>Writing to PXF External Tables
    +The PXF HDFS plug-in supports writable two profiles: `HdfsTextSimple` and `SequenceWritable`.
    --- End diff --
    
    writable two -> two writable
    
    Also, seems like there should be some mention of the difference between these profiles by now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by dyozie <gi...@git.apache.org>.
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r85794241
  
    --- Diff: pxf/HDFSWritablePXF.html.md.erb ---
    @@ -0,0 +1,410 @@
    +---
    +title: Writing Data to HDFS
    +---
    +
    +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable table to export data from a HAWQ internal table to HDFS.
    +
    +This section describes how to use these PXF profiles to create writable external tables.
    +
    +**Note**: You cannot directly query data in a HAWQ writable table.  After creating the external writable table, you must create a HAWQ readable external table accessing the HDFS file, then query that table. ??You can also create a Hive table to access the HDFS file.??
    +
    +## <a id="pxfwrite_prereq"></a>Prerequisites
    +
    +Before working with HDFS file data using HAWQ and PXF, ensure that:
    +
    +-   The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
    +-   All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users.
    +
    +## <a id="hdfsplugin_writeextdata"></a>Writing to PXF External Tables
    +The PXF HDFS plug-in supports writable two profiles: `HdfsTextSimple` and `SequenceWritable`.
    +
    +Use the following syntax to create a HAWQ external writable table representing HDFS data:�
    +
    +``` sql
    +CREATE EXTERNAL WRITABLE TABLE <table_name> 
    +    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
    +LOCATION ('pxf://<host>[:<port>]/<path-to-hdfs-file>
    +    ?PROFILE=HdfsTextSimple|SequenceWritable[&<custom-option>=<value>[...]]')
    +FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
    +```
    +
    +HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the table below.
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| \<host\>[:\<port\>]    | The HDFS NameNode and port. |
    +| \<path-to-hdfs-file\>    | The path to the file in the HDFS data store. |
    +| PROFILE    | The `PROFILE` keyword must specify one of the values `HdfsTextSimple` or `SequenceWritable`. |
    +| \<custom-option\>  | \<custom-option\> is profile-specific. These options are discussed in the next topic.|
    +| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile when \<path-to-hdfs-file\> will reference a plain text delimited file. The `HdfsTextSimple` '`TEXT`' `FORMAT` supports only the built-in `(delimiter=<delim>)` \<formatting-property\>. |
    +| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with `HdfsTextSimple` when \<path-to-hdfs-file\> will reference a comma-separated value file.  |
    +| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the `SequenceWritable` profile. The `SequenceWritable` '`CUSTOM`' `FORMAT` supports only the built-in `(formatter='pxfwritable_export)` (write) and `(formatter='pxfwritable_import)` (read) \<formatting-properties\>.
    +
    +**Note**: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification.
    +
    +## <a id="profile_hdfstextsimple"></a>Custom Options
    +
    +The `HdfsTextSimple` and `SequenceWritable` profiles support the following \<custom-options\>:
    --- End diff --
    
    Change <custom-options> to "<custom-option> values"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by dyozie <gi...@git.apache.org>.
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r96702640
  
    --- Diff: markdown/pxf/HDFSWritablePXF.html.md.erb ---
    @@ -136,7 +136,7 @@ Perform the following operations to use the PXF `HdfsTextSimple` profile to crea
     			    FORMAT 'CSV';
         ```
     
    -    The table is created with the `'CSV'` `FORMAT` because the delimiter character used when creating the writable table was a comma `,`.
    +    The table is created with the `'CSV'` `FORMAT` because the delimiter character with which you created the writable table was a comma `,`.
    --- End diff --
    
    This sounds a little awkward.  Edit to:
    
    "Specify the the `'CSV'` `FORMAT` for the readable table, because you created the writable table with a comma ',' as the delimiter character.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by dyozie <gi...@git.apache.org>.
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r85793349
  
    --- Diff: pxf/HDFSWritablePXF.html.md.erb ---
    @@ -0,0 +1,410 @@
    +---
    +title: Writing Data to HDFS
    +---
    +
    +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable table to export data from a HAWQ internal table to HDFS.
    +
    +This section describes how to use these PXF profiles to create writable external tables.
    +
    +**Note**: You cannot directly query data in a HAWQ writable table.  After creating the external writable table, you must create a HAWQ readable external table accessing the HDFS file, then query that table. ??You can also create a Hive table to access the HDFS file.??
    +
    +## <a id="pxfwrite_prereq"></a>Prerequisites
    +
    +Before working with HDFS file data using HAWQ and PXF, ensure that:
    +
    +-   The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
    +-   All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users.
    +
    +## <a id="hdfsplugin_writeextdata"></a>Writing to PXF External Tables
    +The PXF HDFS plug-in supports writable two profiles: `HdfsTextSimple` and `SequenceWritable`.
    +
    +Use the following syntax to create a HAWQ external writable table representing HDFS data:�
    +
    +``` sql
    +CREATE EXTERNAL WRITABLE TABLE <table_name> 
    +    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
    +LOCATION ('pxf://<host>[:<port>]/<path-to-hdfs-file>
    +    ?PROFILE=HdfsTextSimple|SequenceWritable[&<custom-option>=<value>[...]]')
    +FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
    +```
    +
    +HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the table below.
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| \<host\>[:\<port\>]    | The HDFS NameNode and port. |
    +| \<path-to-hdfs-file\>    | The path to the file in the HDFS data store. |
    +| PROFILE    | The `PROFILE` keyword must specify one of the values `HdfsTextSimple` or `SequenceWritable`. |
    +| \<custom-option\>  | \<custom-option\> is profile-specific. These options are discussed in the next topic.|
    +| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile when \<path-to-hdfs-file\> will reference a plain text delimited file. The `HdfsTextSimple` '`TEXT`' `FORMAT` supports only the built-in `(delimiter=<delim>)` \<formatting-property\>. |
    --- End diff --
    
    Change the wording of this row and the next to something like:  Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile to create a plain-text-delimited file at the specified \<path-to-hdfs-file\>. 
    
    (same for CSV)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by dyozie <gi...@git.apache.org>.
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r85845526
  
    --- Diff: pxf/HDFSWritablePXF.html.md.erb ---
    @@ -0,0 +1,410 @@
    +---
    +title: Writing Data to HDFS
    +---
    +
    +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable table to export data from a HAWQ internal table to HDFS.
    +
    +This section describes how to use these PXF profiles to create writable external tables.
    +
    +**Note**: You cannot directly query data in a HAWQ writable table.  After creating the external writable table, you must create a HAWQ readable external table accessing the HDFS file, then query that table. ??You can also create a Hive table to access the HDFS file.??
    +
    +## <a id="pxfwrite_prereq"></a>Prerequisites
    +
    +Before working with HDFS file data using HAWQ and PXF, ensure that:
    +
    +-   The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
    +-   All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users.
    +
    +## <a id="hdfsplugin_writeextdata"></a>Writing to PXF External Tables
    +The PXF HDFS plug-in supports writable two profiles: `HdfsTextSimple` and `SequenceWritable`.
    +
    +Use the following syntax to create a HAWQ external writable table representing HDFS data:�
    +
    +``` sql
    +CREATE EXTERNAL WRITABLE TABLE <table_name> 
    +    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
    +LOCATION ('pxf://<host>[:<port>]/<path-to-hdfs-file>
    +    ?PROFILE=HdfsTextSimple|SequenceWritable[&<custom-option>=<value>[...]]')
    +FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
    +```
    +
    +HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the table below.
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| \<host\>[:\<port\>]    | The HDFS NameNode and port. |
    +| \<path-to-hdfs-file\>    | The path to the file in the HDFS data store. |
    +| PROFILE    | The `PROFILE` keyword must specify one of the values `HdfsTextSimple` or `SequenceWritable`. |
    +| \<custom-option\>  | \<custom-option\> is profile-specific. These options are discussed in the next topic.|
    +| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile when \<path-to-hdfs-file\> will reference a plain text delimited file. The `HdfsTextSimple` '`TEXT`' `FORMAT` supports only the built-in `(delimiter=<delim>)` \<formatting-property\>. |
    +| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with `HdfsTextSimple` when \<path-to-hdfs-file\> will reference a comma-separated value file.  |
    +| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the `SequenceWritable` profile. The `SequenceWritable` '`CUSTOM`' `FORMAT` supports only the built-in `(formatter='pxfwritable_export)` (write) and `(formatter='pxfwritable_import)` (read) \<formatting-properties\>.
    +
    +**Note**: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification.
    +
    +## <a id="profile_hdfstextsimple"></a>Custom Options
    +
    +The `HdfsTextSimple` and `SequenceWritable` profiles support the following \<custom-options\>:
    +
    +| Keyword  | Value Description |
    +|-------|-------------------------------------|
    +| COMPRESSION_CODEC    | The compression codec Java class name. If this option is not provided, no data compression is performed. Supported compression codecs include: `org.apache.hadoop.io.compress.DefaultCodec`, `org.apache.hadoop.io.compress.BZip2Codec`, and `org.apache.hadoop.io.compress.GzipCodec` (`HdfsTextSimple` profile only) |
    +| COMPRESSION_TYPE    | The compression type to employ; supported values are `RECORD` (the default) or `BLOCK`. |
    +| DATA-SCHEMA    | (`SequenceWritable` profile only) The name of the writer serialization/deserialization class. The jar file in which this class resides must be in the PXF class path. This option has no default value. |
    +| THREAD-SAFE | Boolean value determining if a table query can run in multi-thread mode. Default value is `TRUE`, requests run in multi-threaded mode. When set to `FALSE`, requests will be handled in a single thread.  `THREAD-SAFE` should be set appropriately when operations that are not thread-safe are performed (i.e. compression). |
    +
    +## <a id="profile_hdfstextsimple"></a>HdfsTextSimple Profile
    +
    +Use the `HdfsTextSimple` profile when writing delimited data to a plain text file where each row is a single record.
    +
    +Writable tables created using the `HdfsTextSimple` profile can use no, record, or block compression. When compression is used, the default, gzip, and bzip2 Hadoop compression codecs are supported:
    +
    +- org.apache.hadoop.io.compress.DefaultCodec
    +- org.apache.hadoop.io.compress.GzipCodec
    +- org.apache.hadoop.io.compress.BZip2Codec
    +
    +\<formatting-properties\> supported by the `HdfsTextSimple` profile include:
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| delimiter    | The delimiter character to use when writing the file. Default value is a comma `,`.|
    +
    +
    +### <a id="profile_hdfstextsimple_writing"></a>Example: Writing Using the HdfsTextSimple Profile
    +
    +This example uses the data schema introduced in [Example: Using the HdfsTextSimple Profile] (HDFSFileDataPXF.html#profile_hdfstextsimple_query):
    +
    +
    +| Field Name  | Data Type |
    +|-------|-------------------------------------|
    +| location | text |
    +| month | text |
    +| number\_of\_orders | int |
    +| total\_sales | float8 |
    +
    +
    +Perform the following operations to use the PXF `HdfsTextSimple` profile to create a HAWQ writable external table with the same data schema as defined above. You will also create a separate external readable table to read the associated HDFS file.
    +
    +1. Create a writable HAWQ external table with the data schema described above. Write to the HDFS file `/data/pxf_examples/pxfwritable_hdfs_textsimple1`. Create the table specifying a comma `,` as the delimiter:
    +
    +    ``` sql
    +    gpadmin=# CREATE WRITABLE EXTERNAL TABLE pxf_hdfs_writabletbl_1(location text, month text, num_orders int, total_sales float8)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxfwritable_hdfs_textsimple1?PROFILE=HdfsTextSimple')
    +              FORMAT 'TEXT' (delimiter=E',');
    +    ```
    +    
    +    The `FORMAT` subclause `delimiter` value is specified as the single ascii comma character ','. `E` escapes the character.
    +
    +2. Write a few records to the `pxfwritable_hdfs_textsimple1` HDFS file by invoking the SQL `INSERT` command on `pxf_hdfs_writabletbl_1`:
    +
    +    ``` sql
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_1 VALUES ( 'Frankfurt', 'Mar', 777, 3956.98 );
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_1 VALUES ( 'Cleveland', 'Oct', 3812, 96645.37 );
    +    ```
    +
    +3. Insert the contents of the `pxf_hdfs_textsimple` table created in [Example: Using the HdfsTextSimple Profile] (HDFSFileDataPXF.html#profile_hdfstextsimple_query) into `pxf_hdfs_writabletbl_1`:
    +
    +    ``` sql
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_1 SELECT * FROM pxf_hdfs_textsimple;
    +    ```
    +
    +4. View the file contents in HDFS:
    +
    +    ``` shell
    +    $ hdfs dfs -cat /data/pxf_examples/pxfwritable_hdfs_textsimple1/*
    +    Frankfurt,Mar,777,3956.98
    +    Cleveland,Oct,3812,96645.37
    +    Prague,Jan,101,4875.33
    +    Rome,Mar,87,1557.39
    +    Bangalore,May,317,8936.99
    +    Beijing,Jul,411,11600.67
    +    ```
    +
    +    Because you specified comma `,` as the delimiter, this character is the field separator used in each record of the HDFS file.
    +
    +5. You may recall that querying an external writable table is not supported in HAWQ. To read the newly-created writable table, create a readable external HAWQ table referencing the writable table's HDFS file:
    --- End diff --
    
    Remove "You may recall that"  Also, change second section to something like:  "To query data from the newly-created HDFS file, create a readable...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by dyozie <gi...@git.apache.org>.
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r85845296
  
    --- Diff: pxf/HDFSWritablePXF.html.md.erb ---
    @@ -0,0 +1,410 @@
    +---
    +title: Writing Data to HDFS
    +---
    +
    +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable table to export data from a HAWQ internal table to HDFS.
    +
    +This section describes how to use these PXF profiles to create writable external tables.
    +
    +**Note**: You cannot directly query data in a HAWQ writable table.  After creating the external writable table, you must create a HAWQ readable external table accessing the HDFS file, then query that table. ??You can also create a Hive table to access the HDFS file.??
    +
    +## <a id="pxfwrite_prereq"></a>Prerequisites
    +
    +Before working with HDFS file data using HAWQ and PXF, ensure that:
    +
    +-   The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
    +-   All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users.
    +
    +## <a id="hdfsplugin_writeextdata"></a>Writing to PXF External Tables
    +The PXF HDFS plug-in supports writable two profiles: `HdfsTextSimple` and `SequenceWritable`.
    +
    +Use the following syntax to create a HAWQ external writable table representing HDFS data:�
    +
    +``` sql
    +CREATE EXTERNAL WRITABLE TABLE <table_name> 
    +    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
    +LOCATION ('pxf://<host>[:<port>]/<path-to-hdfs-file>
    +    ?PROFILE=HdfsTextSimple|SequenceWritable[&<custom-option>=<value>[...]]')
    +FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
    +```
    +
    +HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the table below.
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| \<host\>[:\<port\>]    | The HDFS NameNode and port. |
    +| \<path-to-hdfs-file\>    | The path to the file in the HDFS data store. |
    +| PROFILE    | The `PROFILE` keyword must specify one of the values `HdfsTextSimple` or `SequenceWritable`. |
    +| \<custom-option\>  | \<custom-option\> is profile-specific. These options are discussed in the next topic.|
    +| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile when \<path-to-hdfs-file\> will reference a plain text delimited file. The `HdfsTextSimple` '`TEXT`' `FORMAT` supports only the built-in `(delimiter=<delim>)` \<formatting-property\>. |
    +| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with `HdfsTextSimple` when \<path-to-hdfs-file\> will reference a comma-separated value file.  |
    +| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the `SequenceWritable` profile. The `SequenceWritable` '`CUSTOM`' `FORMAT` supports only the built-in `(formatter='pxfwritable_export)` (write) and `(formatter='pxfwritable_import)` (read) \<formatting-properties\>.
    +
    +**Note**: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification.
    +
    +## <a id="profile_hdfstextsimple"></a>Custom Options
    +
    +The `HdfsTextSimple` and `SequenceWritable` profiles support the following \<custom-options\>:
    +
    +| Keyword  | Value Description |
    +|-------|-------------------------------------|
    +| COMPRESSION_CODEC    | The compression codec Java class name. If this option is not provided, no data compression is performed. Supported compression codecs include: `org.apache.hadoop.io.compress.DefaultCodec`, `org.apache.hadoop.io.compress.BZip2Codec`, and `org.apache.hadoop.io.compress.GzipCodec` (`HdfsTextSimple` profile only) |
    +| COMPRESSION_TYPE    | The compression type to employ; supported values are `RECORD` (the default) or `BLOCK`. |
    +| DATA-SCHEMA    | (`SequenceWritable` profile only) The name of the writer serialization/deserialization class. The jar file in which this class resides must be in the PXF class path. This option has no default value. |
    +| THREAD-SAFE | Boolean value determining if a table query can run in multi-thread mode. Default value is `TRUE`, requests run in multi-threaded mode. When set to `FALSE`, requests will be handled in a single thread.  `THREAD-SAFE` should be set appropriately when operations that are not thread-safe are performed (i.e. compression). |
    +
    +## <a id="profile_hdfstextsimple"></a>HdfsTextSimple Profile
    +
    +Use the `HdfsTextSimple` profile when writing delimited data to a plain text file where each row is a single record.
    +
    +Writable tables created using the `HdfsTextSimple` profile can use no, record, or block compression. When compression is used, the default, gzip, and bzip2 Hadoop compression codecs are supported:
    +
    +- org.apache.hadoop.io.compress.DefaultCodec
    +- org.apache.hadoop.io.compress.GzipCodec
    +- org.apache.hadoop.io.compress.BZip2Codec
    +
    +\<formatting-properties\> supported by the `HdfsTextSimple` profile include:
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| delimiter    | The delimiter character to use when writing the file. Default value is a comma `,`.|
    +
    +
    +### <a id="profile_hdfstextsimple_writing"></a>Example: Writing Using the HdfsTextSimple Profile
    +
    +This example uses the data schema introduced in [Example: Using the HdfsTextSimple Profile] (HDFSFileDataPXF.html#profile_hdfstextsimple_query):
    --- End diff --
    
    Best to clarify that it uses/expects the actual table with that data to already exist. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by dyozie <gi...@git.apache.org>.
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r86050201
  
    --- Diff: pxf/HDFSWritablePXF.html.md.erb ---
    @@ -0,0 +1,416 @@
    +---
    +title: Writing Data to HDFS
    +---
    +
    +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable table to export data from a HAWQ internal table to binary or text HDFS files.
    +
    +Use the `HdfsTextSimple` profile when writing text data. Use the `SequenceWritable` profile when dealing with binary data.
    +
    +This section describes how to use these PXF profiles to create writable external tables.
    +
    +**Note**: Tables that you create with writable profiles can only be used for INSERT operations.  If you want to query inserted data, you must define a separate external readable table that references the new HDFS file using the equivalent readable profile.  ??You can also create a Hive table to access the HDFS file.??
    +
    +## <a id="pxfwrite_prereq"></a>Prerequisites
    +
    +Before working with HDFS file data using HAWQ and PXF, ensure that:
    +
    +-   The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
    +-   All HDFS users have read permissions to HDFS services.
    +-   HDFS write permissions are provided to a restricted set of users.
    +
    +## <a id="hdfsplugin_writeextdata"></a>Writing to PXF External Tables
    +The PXF HDFS plug-in supports two writable profiles: `HdfsTextSimple` and `SequenceWritable`.
    +
    +Use the following syntax to create a HAWQ external writable table representing HDFS data:�
    +
    +``` sql
    +CREATE WRITABLE EXTERNAL TABLE <table_name> 
    +    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
    +LOCATION ('pxf://<host>[:<port>]/<path-to-hdfs-file>
    +    ?PROFILE=HdfsTextSimple|SequenceWritable[&<custom-option>=<value>[...]]')
    +FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
    +```
    +
    +HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the table below.
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| \<host\>[:\<port\>]    | The HDFS NameNode and port. |
    +| \<path-to-hdfs-file\>    | The path to the file in the HDFS data store. |
    +| PROFILE    | The `PROFILE` keyword must specify one of the values `HdfsTextSimple` or `SequenceWritable`. |
    +| \<custom-option\>  | \<custom-option\> is profile-specific. These options are discussed in the next topic.|
    +| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile to create a plain-text-delimited file at the location specified by \<path-to-hdfs-file\>. The `HdfsTextSimple` '`TEXT`' `FORMAT` supports only the built-in `(delimiter=<delim>)` \<formatting-property\>. |
    +| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with the `HdfsTextSimple` profile to create a comma-separated-value file at the location specified by \<path-to-hdfs-file\>.  |
    +| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the `SequenceWritable` profile. The `SequenceWritable` '`CUSTOM`' `FORMAT` supports only the built-in `(formatter='pxfwritable_export)` (write) and `(formatter='pxfwritable_import)` (read) \<formatting-properties\>.
    +
    +**Note**: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification.
    +
    +## <a id="profile_hdfstextsimple"></a>Custom Options
    +
    +The `HdfsTextSimple` and `SequenceWritable` profiles support the following custom options:
    +
    +| Option  | Value Description | Profile |
    +|-------|-------------------------------------|--------|
    +| COMPRESSION_CODEC    | The compression codec Java class name. If this option is not provided, no data compression is performed. Supported compression codecs include: `org.apache.hadoop.io.compress.DefaultCodec` and `org.apache.hadoop.io.compress.BZip2Codec` | HdfsTextSimple, SequenceWritable |
    +|    |  `org.apache.hadoop.io.compress.GzipCodec` | HdfsTextSimple |
    +| COMPRESSION_TYPE    | The compression type to employ; supported values are `RECORD` (the default) or `BLOCK`. | HdfsTextSimple, SequenceWritable |
    +| DATA-SCHEMA    | The name of the writer serialization/deserialization class. The jar file in which this class resides must be in the PXF class path. This option is required for the `SequenceWritable` profile and has no default value. | SequenceWritable|
    +| THREAD-SAFE | Boolean value determining if a table query can run in multi-threaded mode. The default value is `TRUE`. Set this option to `FALSE` to handle all requests in a single thread for operations that are not thread-safe (for example, compression). | HdfsTextSimple, SequenceWritable| 
    +
    +## <a id="profile_hdfstextsimple"></a>HdfsTextSimple Profile
    +
    +Use the `HdfsTextSimple` profile when writing delimited data to a plain text file where each row is a single record.
    +
    +Writable tables created using the `HdfsTextSimple` profile can optionally use record or block compression. The following compression codecs are supported:
    +
    +- org.apache.hadoop.io.compress.DefaultCodec
    +- org.apache.hadoop.io.compress.GzipCodec
    +- org.apache.hadoop.io.compress.BZip2Codec
    +
    +The `HdfsTextSimple` profile supports the following \<formatting-properties\>:
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| delimiter    | The delimiter character to use when writing the file. Default value is a comma `,`.|
    +
    +
    +### <a id="profile_hdfstextsimple_writing"></a>Example: Writing Data Using the HdfsTextSimple Profile
    +
    +This example uses the data schema introduced in [Example: Using the HdfsTextSimple Profile] (HDFSFileDataPXF.html#profile_hdfstextsimple_query):
    +
    +
    +| Field Name  | Data Type |
    +|-------|-------------------------------------|
    +| location | text |
    +| month | text |
    +| number\_of\_orders | int |
    +| total\_sales | float8 |
    +
    +The example also uses the HAWQ table `pxf_hdfs_textsimple` created in that exercise and expects it to exist.
    +
    +
    +Perform the following operations to use the PXF `HdfsTextSimple` profile to create a HAWQ writable external table with the same data schema as defined above. You will also create a separate external readable table to read the associated HDFS file.
    +
    +1. Create a writable HAWQ external table with the data schema described above. Write to the HDFS file `/data/pxf_examples/pxfwritable_hdfs_textsimple1`. Create the table specifying a comma `,` as the delimiter:
    +
    +    ``` sql
    +    gpadmin=# CREATE WRITABLE EXTERNAL TABLE pxf_hdfs_writabletbl_1(location text, month text, num_orders int, total_sales float8)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxfwritable_hdfs_textsimple1?PROFILE=HdfsTextSimple')
    +              FORMAT 'TEXT' (delimiter=E',');
    +    ```
    +    
    +    The `FORMAT` subclause `delimiter` value is specified as the single ascii comma character `,`. `E` escapes the character.
    +
    +2. Write a few records to the `pxfwritable_hdfs_textsimple1` HDFS file by invoking the SQL `INSERT` command on `pxf_hdfs_writabletbl_1`:
    +
    +    ``` sql
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_1 VALUES ( 'Frankfurt', 'Mar', 777, 3956.98 );
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_1 VALUES ( 'Cleveland', 'Oct', 3812, 96645.37 );
    +    ```
    +
    +3. Insert the contents of the `pxf_hdfs_textsimple` table created in [Example: Using the HdfsTextSimple Profile] (HDFSFileDataPXF.html#profile_hdfstextsimple_query) into `pxf_hdfs_writabletbl_1`:
    +
    +    ``` sql
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_1 SELECT * FROM pxf_hdfs_textsimple;
    +    ```
    +
    +4. View the file contents in HDFS:
    +
    +    ``` shell
    +    $ hdfs dfs -cat /data/pxf_examples/pxfwritable_hdfs_textsimple1/*
    +    Frankfurt,Mar,777,3956.98
    +    Cleveland,Oct,3812,96645.37
    +    Prague,Jan,101,4875.33
    +    Rome,Mar,87,1557.39
    +    Bangalore,May,317,8936.99
    +    Beijing,Jul,411,11600.67
    +    ```
    +
    +    Because you specified comma `,` as the delimiter, this character is the field separator used in each record of the HDFS file.
    +
    +5. Querying an external writable table is not supported in HAWQ. To query data from the newly-created HDFS file, create a readable external HAWQ table referencing the HDFS file:
    +
    +    ``` sql
    +    gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple_r1(location text, month text, num_orders int, total_sales float8)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxfwritable_hdfs_textsimple1?PROFILE=HdfsTextSimple')
    +			    FORMAT 'CSV';
    +    ```
    +
    +    The table is created with the `'CSV'` `FORMAT` because the delimiter character used when creating the writable table was a comma `,`.
    +
    +6. Query the readable external table `pxf_hdfs_textsimple_r1`:
    +
    +    ``` sql
    +    gpadmin=# SELECT * FROM pxf_hdfs_textsimple_r1;
    +    ```
    +
    +    ``` pre
    +     location  | month | num_orders | total_sales
    +    -----------+-------+------------+-------------
    +      Frankfurt | Mar   |        777 |     3956.98
    +      Cleveland | Oct   |       3812 |    96645.37
    +      Prague    | Jan   |        101 |     4875.33
    +      Rome      | Mar   |         87 |     1557.39
    +      Bangalore | May   |        317 |     8936.99
    +      Beijing   | Jul   |        411 |    11600.67
    +    (6 rows)
    +    ```
    +
    +    The table includes the records you individually inserted, as well as the full contents of the `pxf_hdfs_textsimple` table.
    +
    +7. Create a second HAWQ external writable table, this time using Gzip compression and employing a colon `:` as the delimiter:
    +
    +    ``` sql
    +    gpadmin=# CREATE WRITABLE EXTERNAL TABLE pxf_hdfs_writabletbl_2 (location text, month text, num_orders int, total_sales float8)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxfwritable_hdfs_textsimple2?PROFILE=HdfsTextSimple&COMPRESSION_CODEC=org.apache.hadoop.io.compress.GzipCodec')
    +              FORMAT 'TEXT' (delimiter=E':');
    +    ```
    +
    +8. Write a few records to the `pxfwritable_hdfs_textsimple2` HDFS file by inserting into the `pxf_hdfs_writabletbl_2` table:
    +
    +    ``` sql
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_2 VALUES ( 'Frankfurt', 'Mar', 777, 3956.98 );
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_2 VALUES ( 'Cleveland', 'Oct', 3812, 96645.37 );
    +    ```
    +
    +9. View the file contents in HDFS; use the `-text` option to `hdfs dfs` to view the compressed data as text:
    +
    +    ``` shell
    +    $ hdfs dfs -text /data/pxf_examples/pxfwritable_hdfs_textsimple2/*
    +    Frankfurt:Mar:777:3956.98
    +    Cleveland:Oct:3812:96645.3
    +    ```
    +
    +    Notice that the colon `:` is the field separator in the HDFS file.
    +
    +    As described in Step 5 above, to query data from the newly-created HDFS file named `pxfwritable_hdfs_textsimple2`, you can create a readable external HAWQ table referencing this HDFS file.
    +
    +
    +## <a id="profile_hdfsseqwritable_using"></a>SequenceWritable Profile
    +
    +Use the HDFS plug-in `SequenceWritable` profile when writing SequenceFile format files. Files of this type consist of binary key/value pairs. Sequence files are a common data transfer format between MapReduce jobs. 
    +
    +SequenceFile format files can optionally use record or block compression. The following compression codecs are supported:
    +
    +- org.apache.hadoop.io.compress.DefaultCodec
    +- org.apache.hadoop.io.compress.BZip2Codec
    +
    +When using the `SequenceWritable` profile to write a SequenceFile format file, you must provide the name of the Java class used to serialize/deserialize the data. This class will provide custom read and write methods for the fields in the schema associated with the data you will be writing.
    +
    +### <a id="profile_hdfsseqwritable_writing"></a>Example: Writing Using the SequenceWritable Profile
    +
    +Before creating a writable external table with the `SequenceWritable` profile, you will first create a Java class named `PxfExample_CustomWritable` that will serialize/deserialize the fields in the sample schema used in previous examples.
    +
    +Perform the following steps to create the Java class and writable table.
    +
    +1. Prepare to create the sample Java class:
    +
    +    ``` shell
    +    $ mkdir -p pxfex/com/hawq/example/pxf/hdfs/writable/dataschema
    +    $ cd pxfex/com/hawq/example/pxf/hdfs/writable/dataschema
    +    $ vi PxfExample_CustomWritable.java
    +    ```
    +
    +2. Copy and paste the following text into the `PxfExample_CustomWritable.java` file:
    +
    +    ``` java
    +    package com.hawq.example.pxf.hdfs.writable.dataschema;
    +    
    +    import org.apache.hadoop.io.*;
    +import java.io.DataInput;
    +import java.io.DataOutput;
    +import java.io.IOException;
    +import java.lang.reflect.Field;
    +
    +    /**
    + * PxfExample_CustomWritable class - used to serialize and deserialize data with
    + * text, int, and float data types
    + */
    +public class PxfExample_CustomWritable implements Writable {
    +
    +    public String st1, st2;
    +    public int int1;
    +    public float ft;
    +
    +    public PxfExample_CustomWritable() {
    +        st1 = new String("");
    +        st2 = new String("");
    +        int1 = 0;
    +        ft = 0.f;
    +    }
    +
    +    public PxfExample_CustomWritable(int i1, int i2, int i3) {
    +
    +        st1 = new String("short_string___" + i1);
    +        st2 = new String("short_string___" + i1);
    +        int1 = i2;
    +        ft = i1 * 10.f * 2.3f;
    +
    +    }
    +
    +    String GetSt1() {
    +        return st1;
    +    }
    +
    +    String GetSt2() {
    +        return st2;
    +    }
    +
    +    int GetInt1() {
    +        return int1;
    +    }
    +    
    +    float GetFt() {
    +        return ft;
    +    }
    +
    +    @Override
    +    public void write(DataOutput out) throws IOException {
    +
    +        Text txt = new Text();
    +        txt.set(st1);
    +        txt.write(out);
    +        txt.set(st2);
    +        txt.write(out);
    +
    +        IntWritable intw = new IntWritable();
    +        intw.set(int1);
    +        intw.write(out);
    +
    +        FloatWritable fw = new FloatWritable();
    +        fw.set(ft);
    +        fw.write(out);
    +    }
    +
    +    @Override
    +    public void readFields(DataInput in) throws IOException {
    +
    +        Text txt = new Text();
    +        txt.readFields(in);
    +        st1 = txt.toString();
    +        txt.readFields(in);
    +        st2 = txt.toString();
    +
    +        IntWritable intw = new IntWritable();
    +        intw.readFields(in);
    +        int1 = intw.get();
    +
    +        FloatWritable fw = new FloatWritable();
    +        fw.readFields(in);
    +        ft = fw.get();
    +    }
    +
    +    public void printFieldTypes() {
    +        Class myClass = this.getClass();
    +        Field[] fields = myClass.getDeclaredFields();
    +
    +        for (int i = 0; i < fields.length; i++) {
    +            System.out.println(fields[i].getType().getName());
    +        }
    +    }
    +}
    +    ```
    +
    +3. Compile and create a Java class jar file for `PxfExample_CustomWritable`:
    +
    +    ``` shell
    +    $ javac -classpath /usr/hdp/2.5.0.0-1245/hadoop/hadoop-common.jar  PxfExample_CustomWritable.java
    +    $ cd ../../../../../../../
    +    $ jar cf pxfex-customwritable.jar com
    +    $ cp pxfex-customwritable.jar /tmp/
    +    ```
    +
    +4. Include the new jar file in the PXF Agent class path by updating the `pxf-public.classpath`. If you use Ambari to manage your cluster, add the following line via the Ambari UI and restart the PXF Agent:
    --- End diff --
    
    class path -> classpath  
    
    Also:  by updating the pxf-public.classpath file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by kavinderd <gi...@git.apache.org>.
Github user kavinderd commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r96086235
  
    --- Diff: markdown/pxf/HDFSWritablePXF.html.md.erb ---
    @@ -0,0 +1,416 @@
    +---
    +title: Writing Data to HDFS
    +---
    +
    +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable table to export data from a HAWQ internal table to binary or text HDFS files.
    +
    +Use the `HdfsTextSimple` profile when writing text data. Use the `SequenceWritable` profile when dealing with binary data.
    +
    +This section describes how to use these PXF profiles to create writable external tables.
    +
    +**Note**: Tables that you create with writable profiles can only be used for INSERT operations.  If you want to query inserted data, you must define a separate external readable table that references the new HDFS file using the equivalent readable profile.  ??You can also create a Hive table to access the HDFS file.??
    +
    +## <a id="pxfwrite_prereq"></a>Prerequisites
    +
    +Before working with HDFS file data using HAWQ and PXF, ensure that:
    +
    +-   The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
    +-   All HDFS users have read permissions to HDFS services.
    +-   HDFS write permissions are provided to a restricted set of users.
    +
    +## <a id="hdfsplugin_writeextdata"></a>Writing to PXF External Tables
    +The PXF HDFS plug-in supports two writable profiles: `HdfsTextSimple` and `SequenceWritable`.
    +
    +Use the following syntax to create a HAWQ external writable table representing HDFS data:�
    +
    +``` sql
    +CREATE WRITABLE EXTERNAL TABLE <table_name> 
    +    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
    +LOCATION ('pxf://<host>[:<port>]/<path-to-hdfs-file>
    +    ?PROFILE=HdfsTextSimple|SequenceWritable[&<custom-option>=<value>[...]]')
    +FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
    +```
    +
    +HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the table below.
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| \<host\>[:\<port\>]    | The HDFS NameNode and port. |
    +| \<path-to-hdfs-file\>    | The path to the file in the HDFS data store. |
    +| PROFILE    | The `PROFILE` keyword must specify one of the values `HdfsTextSimple` or `SequenceWritable`. |
    +| \<custom-option\>  | \<custom-option\> is profile-specific. These options are discussed in the next topic.|
    +| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile to create a plain-text-delimited file at the location specified by \<path-to-hdfs-file\>. The `HdfsTextSimple` '`TEXT`' `FORMAT` supports only the built-in `(delimiter=<delim>)` \<formatting-property\>. |
    +| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with the `HdfsTextSimple` profile to create a comma-separated-value file at the location specified by \<path-to-hdfs-file\>.  |
    +| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the `SequenceWritable` profile. The `SequenceWritable` '`CUSTOM`' `FORMAT` supports only the built-in `(formatter='pxfwritable_export)` (write) and `(formatter='pxfwritable_import)` (read) \<formatting-properties\>.
    +
    +**Note**: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification.
    +
    +## <a id="profile_hdfstextsimple"></a>Custom Options
    +
    +The `HdfsTextSimple` and `SequenceWritable` profiles support the following custom options:
    +
    +| Option  | Value Description | Profile |
    +|-------|-------------------------------------|--------|
    +| COMPRESSION_CODEC    | The compression codec Java class name. If this option is not provided, no data compression is performed. Supported compression codecs include: `org.apache.hadoop.io.compress.DefaultCodec` and `org.apache.hadoop.io.compress.BZip2Codec` | HdfsTextSimple, SequenceWritable |
    +|    |  `org.apache.hadoop.io.compress.GzipCodec` | HdfsTextSimple |
    --- End diff --
    
    Is the formatting of this correct for `org.apache.hadoop.io.compress.GzipCode`? It seems like it should be a part of the row above


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by dyozie <gi...@git.apache.org>.
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r85794041
  
    --- Diff: pxf/HDFSWritablePXF.html.md.erb ---
    @@ -0,0 +1,410 @@
    +---
    +title: Writing Data to HDFS
    +---
    +
    +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable table to export data from a HAWQ internal table to HDFS.
    +
    +This section describes how to use these PXF profiles to create writable external tables.
    +
    +**Note**: You cannot directly query data in a HAWQ writable table.  After creating the external writable table, you must create a HAWQ readable external table accessing the HDFS file, then query that table. ??You can also create a Hive table to access the HDFS file.??
    +
    +## <a id="pxfwrite_prereq"></a>Prerequisites
    +
    +Before working with HDFS file data using HAWQ and PXF, ensure that:
    +
    +-   The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
    +-   All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users.
    +
    +## <a id="hdfsplugin_writeextdata"></a>Writing to PXF External Tables
    +The PXF HDFS plug-in supports writable two profiles: `HdfsTextSimple` and `SequenceWritable`.
    +
    +Use the following syntax to create a HAWQ external writable table representing HDFS data:�
    +
    +``` sql
    +CREATE EXTERNAL WRITABLE TABLE <table_name> 
    +    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
    +LOCATION ('pxf://<host>[:<port>]/<path-to-hdfs-file>
    +    ?PROFILE=HdfsTextSimple|SequenceWritable[&<custom-option>=<value>[...]]')
    +FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
    +```
    +
    +HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the table below.
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| \<host\>[:\<port\>]    | The HDFS NameNode and port. |
    +| \<path-to-hdfs-file\>    | The path to the file in the HDFS data store. |
    +| PROFILE    | The `PROFILE` keyword must specify one of the values `HdfsTextSimple` or `SequenceWritable`. |
    +| \<custom-option\>  | \<custom-option\> is profile-specific. These options are discussed in the next topic.|
    +| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile when \<path-to-hdfs-file\> will reference a plain text delimited file. The `HdfsTextSimple` '`TEXT`' `FORMAT` supports only the built-in `(delimiter=<delim>)` \<formatting-property\>. |
    +| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with `HdfsTextSimple` when \<path-to-hdfs-file\> will reference a comma-separated value file.  |
    +| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the `SequenceWritable` profile. The `SequenceWritable` '`CUSTOM`' `FORMAT` supports only the built-in `(formatter='pxfwritable_export)` (write) and `(formatter='pxfwritable_import)` (read) \<formatting-properties\>.
    +
    +**Note**: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification.
    +
    +## <a id="profile_hdfstextsimple"></a>Custom Options
    +
    +The `HdfsTextSimple` and `SequenceWritable` profiles support the following \<custom-options\>:
    +
    +| Keyword  | Value Description |
    --- End diff --
    
    Change "Keyword" to "Option"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by dyozie <gi...@git.apache.org>.
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r85789703
  
    --- Diff: pxf/HDFSWritablePXF.html.md.erb ---
    @@ -0,0 +1,410 @@
    +---
    +title: Writing Data to HDFS
    +---
    +
    +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable table to export data from a HAWQ internal table to HDFS.
    --- End diff --
    
    Seems like this last sentence should say something other than "to HDFS", since internal tables are already stored in HDFS but in HAWQ proprietary format.  Not sure what the best wording is - HDFS text files/flat files? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by dyozie <gi...@git.apache.org>.
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r86052740
  
    --- Diff: pxf/HDFSWritablePXF.html.md.erb ---
    @@ -0,0 +1,416 @@
    +---
    +title: Writing Data to HDFS
    +---
    +
    +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable table to export data from a HAWQ internal table to binary or text HDFS files.
    +
    +Use the `HdfsTextSimple` profile when writing text data. Use the `SequenceWritable` profile when dealing with binary data.
    +
    +This section describes how to use these PXF profiles to create writable external tables.
    +
    +**Note**: Tables that you create with writable profiles can only be used for INSERT operations.  If you want to query inserted data, you must define a separate external readable table that references the new HDFS file using the equivalent readable profile.  ??You can also create a Hive table to access the HDFS file.??
    +
    +## <a id="pxfwrite_prereq"></a>Prerequisites
    +
    +Before working with HDFS file data using HAWQ and PXF, ensure that:
    +
    +-   The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
    +-   All HDFS users have read permissions to HDFS services.
    +-   HDFS write permissions are provided to a restricted set of users.
    +
    +## <a id="hdfsplugin_writeextdata"></a>Writing to PXF External Tables
    +The PXF HDFS plug-in supports two writable profiles: `HdfsTextSimple` and `SequenceWritable`.
    +
    +Use the following syntax to create a HAWQ external writable table representing HDFS data:�
    +
    +``` sql
    +CREATE WRITABLE EXTERNAL TABLE <table_name> 
    +    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
    +LOCATION ('pxf://<host>[:<port>]/<path-to-hdfs-file>
    +    ?PROFILE=HdfsTextSimple|SequenceWritable[&<custom-option>=<value>[...]]')
    +FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
    +```
    +
    +HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the table below.
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| \<host\>[:\<port\>]    | The HDFS NameNode and port. |
    +| \<path-to-hdfs-file\>    | The path to the file in the HDFS data store. |
    +| PROFILE    | The `PROFILE` keyword must specify one of the values `HdfsTextSimple` or `SequenceWritable`. |
    +| \<custom-option\>  | \<custom-option\> is profile-specific. These options are discussed in the next topic.|
    +| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile to create a plain-text-delimited file at the location specified by \<path-to-hdfs-file\>. The `HdfsTextSimple` '`TEXT`' `FORMAT` supports only the built-in `(delimiter=<delim>)` \<formatting-property\>. |
    +| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with the `HdfsTextSimple` profile to create a comma-separated-value file at the location specified by \<path-to-hdfs-file\>.  |
    +| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the `SequenceWritable` profile. The `SequenceWritable` '`CUSTOM`' `FORMAT` supports only the built-in `(formatter='pxfwritable_export)` (write) and `(formatter='pxfwritable_import)` (read) \<formatting-properties\>.
    +
    +**Note**: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification.
    +
    +## <a id="profile_hdfstextsimple"></a>Custom Options
    +
    +The `HdfsTextSimple` and `SequenceWritable` profiles support the following custom options:
    +
    +| Option  | Value Description | Profile |
    +|-------|-------------------------------------|--------|
    +| COMPRESSION_CODEC    | The compression codec Java class name. If this option is not provided, no data compression is performed. Supported compression codecs include: `org.apache.hadoop.io.compress.DefaultCodec` and `org.apache.hadoop.io.compress.BZip2Codec` | HdfsTextSimple, SequenceWritable |
    +|    |  `org.apache.hadoop.io.compress.GzipCodec` | HdfsTextSimple |
    +| COMPRESSION_TYPE    | The compression type to employ; supported values are `RECORD` (the default) or `BLOCK`. | HdfsTextSimple, SequenceWritable |
    +| DATA-SCHEMA    | The name of the writer serialization/deserialization class. The jar file in which this class resides must be in the PXF class path. This option is required for the `SequenceWritable` profile and has no default value. | SequenceWritable|
    +| THREAD-SAFE | Boolean value determining if a table query can run in multi-threaded mode. The default value is `TRUE`. Set this option to `FALSE` to handle all requests in a single thread for operations that are not thread-safe (for example, compression). | HdfsTextSimple, SequenceWritable| 
    +
    +## <a id="profile_hdfstextsimple"></a>HdfsTextSimple Profile
    +
    +Use the `HdfsTextSimple` profile when writing delimited data to a plain text file where each row is a single record.
    +
    +Writable tables created using the `HdfsTextSimple` profile can optionally use record or block compression. The following compression codecs are supported:
    +
    +- org.apache.hadoop.io.compress.DefaultCodec
    +- org.apache.hadoop.io.compress.GzipCodec
    +- org.apache.hadoop.io.compress.BZip2Codec
    +
    +The `HdfsTextSimple` profile supports the following \<formatting-properties\>:
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| delimiter    | The delimiter character to use when writing the file. Default value is a comma `,`.|
    +
    +
    +### <a id="profile_hdfstextsimple_writing"></a>Example: Writing Data Using the HdfsTextSimple Profile
    +
    +This example uses the data schema introduced in [Example: Using the HdfsTextSimple Profile] (HDFSFileDataPXF.html#profile_hdfstextsimple_query):
    +
    +
    +| Field Name  | Data Type |
    +|-------|-------------------------------------|
    +| location | text |
    +| month | text |
    +| number\_of\_orders | int |
    +| total\_sales | float8 |
    +
    +The example also uses the HAWQ table `pxf_hdfs_textsimple` created in that exercise and expects it to exist.
    +
    +
    +Perform the following operations to use the PXF `HdfsTextSimple` profile to create a HAWQ writable external table with the same data schema as defined above. You will also create a separate external readable table to read the associated HDFS file.
    +
    +1. Create a writable HAWQ external table with the data schema described above. Write to the HDFS file `/data/pxf_examples/pxfwritable_hdfs_textsimple1`. Create the table specifying a comma `,` as the delimiter:
    +
    +    ``` sql
    +    gpadmin=# CREATE WRITABLE EXTERNAL TABLE pxf_hdfs_writabletbl_1(location text, month text, num_orders int, total_sales float8)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxfwritable_hdfs_textsimple1?PROFILE=HdfsTextSimple')
    +              FORMAT 'TEXT' (delimiter=E',');
    +    ```
    +    
    +    The `FORMAT` subclause `delimiter` value is specified as the single ascii comma character `,`. `E` escapes the character.
    +
    +2. Write a few records to the `pxfwritable_hdfs_textsimple1` HDFS file by invoking the SQL `INSERT` command on `pxf_hdfs_writabletbl_1`:
    +
    +    ``` sql
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_1 VALUES ( 'Frankfurt', 'Mar', 777, 3956.98 );
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_1 VALUES ( 'Cleveland', 'Oct', 3812, 96645.37 );
    +    ```
    +
    +3. Insert the contents of the `pxf_hdfs_textsimple` table created in [Example: Using the HdfsTextSimple Profile] (HDFSFileDataPXF.html#profile_hdfstextsimple_query) into `pxf_hdfs_writabletbl_1`:
    +
    +    ``` sql
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_1 SELECT * FROM pxf_hdfs_textsimple;
    +    ```
    +
    +4. View the file contents in HDFS:
    +
    +    ``` shell
    +    $ hdfs dfs -cat /data/pxf_examples/pxfwritable_hdfs_textsimple1/*
    +    Frankfurt,Mar,777,3956.98
    +    Cleveland,Oct,3812,96645.37
    +    Prague,Jan,101,4875.33
    +    Rome,Mar,87,1557.39
    +    Bangalore,May,317,8936.99
    +    Beijing,Jul,411,11600.67
    +    ```
    +
    +    Because you specified comma `,` as the delimiter, this character is the field separator used in each record of the HDFS file.
    +
    +5. Querying an external writable table is not supported in HAWQ. To query data from the newly-created HDFS file, create a readable external HAWQ table referencing the HDFS file:
    +
    +    ``` sql
    +    gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple_r1(location text, month text, num_orders int, total_sales float8)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxfwritable_hdfs_textsimple1?PROFILE=HdfsTextSimple')
    +			    FORMAT 'CSV';
    +    ```
    +
    +    The table is created with the `'CSV'` `FORMAT` because the delimiter character used when creating the writable table was a comma `,`.
    +
    +6. Query the readable external table `pxf_hdfs_textsimple_r1`:
    +
    +    ``` sql
    +    gpadmin=# SELECT * FROM pxf_hdfs_textsimple_r1;
    +    ```
    +
    +    ``` pre
    +     location  | month | num_orders | total_sales
    +    -----------+-------+------------+-------------
    +      Frankfurt | Mar   |        777 |     3956.98
    +      Cleveland | Oct   |       3812 |    96645.37
    +      Prague    | Jan   |        101 |     4875.33
    +      Rome      | Mar   |         87 |     1557.39
    +      Bangalore | May   |        317 |     8936.99
    +      Beijing   | Jul   |        411 |    11600.67
    +    (6 rows)
    +    ```
    +
    +    The table includes the records you individually inserted, as well as the full contents of the `pxf_hdfs_textsimple` table.
    +
    +7. Create a second HAWQ external writable table, this time using Gzip compression and employing a colon `:` as the delimiter:
    +
    +    ``` sql
    +    gpadmin=# CREATE WRITABLE EXTERNAL TABLE pxf_hdfs_writabletbl_2 (location text, month text, num_orders int, total_sales float8)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxfwritable_hdfs_textsimple2?PROFILE=HdfsTextSimple&COMPRESSION_CODEC=org.apache.hadoop.io.compress.GzipCodec')
    +              FORMAT 'TEXT' (delimiter=E':');
    +    ```
    +
    +8. Write a few records to the `pxfwritable_hdfs_textsimple2` HDFS file by inserting into the `pxf_hdfs_writabletbl_2` table:
    +
    +    ``` sql
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_2 VALUES ( 'Frankfurt', 'Mar', 777, 3956.98 );
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_2 VALUES ( 'Cleveland', 'Oct', 3812, 96645.37 );
    +    ```
    +
    +9. View the file contents in HDFS; use the `-text` option to `hdfs dfs` to view the compressed data as text:
    +
    +    ``` shell
    +    $ hdfs dfs -text /data/pxf_examples/pxfwritable_hdfs_textsimple2/*
    +    Frankfurt:Mar:777:3956.98
    +    Cleveland:Oct:3812:96645.3
    +    ```
    +
    +    Notice that the colon `:` is the field separator in the HDFS file.
    +
    +    As described in Step 5 above, to query data from the newly-created HDFS file named `pxfwritable_hdfs_textsimple2`, you can create a readable external HAWQ table referencing this HDFS file.
    +
    +
    +## <a id="profile_hdfsseqwritable_using"></a>SequenceWritable Profile
    +
    +Use the HDFS plug-in `SequenceWritable` profile when writing SequenceFile format files. Files of this type consist of binary key/value pairs. Sequence files are a common data transfer format between MapReduce jobs. 
    +
    +SequenceFile format files can optionally use record or block compression. The following compression codecs are supported:
    +
    +- org.apache.hadoop.io.compress.DefaultCodec
    +- org.apache.hadoop.io.compress.BZip2Codec
    +
    +When using the `SequenceWritable` profile to write a SequenceFile format file, you must provide the name of the Java class used to serialize/deserialize the data. This class will provide custom read and write methods for the fields in the schema associated with the data you will be writing.
    +
    +### <a id="profile_hdfsseqwritable_writing"></a>Example: Writing Using the SequenceWritable Profile
    +
    +Before creating a writable external table with the `SequenceWritable` profile, you will first create a Java class named `PxfExample_CustomWritable` that will serialize/deserialize the fields in the sample schema used in previous examples.
    +
    +Perform the following steps to create the Java class and writable table.
    +
    +1. Prepare to create the sample Java class:
    +
    +    ``` shell
    +    $ mkdir -p pxfex/com/hawq/example/pxf/hdfs/writable/dataschema
    +    $ cd pxfex/com/hawq/example/pxf/hdfs/writable/dataschema
    +    $ vi PxfExample_CustomWritable.java
    +    ```
    +
    +2. Copy and paste the following text into the `PxfExample_CustomWritable.java` file:
    +
    +    ``` java
    +    package com.hawq.example.pxf.hdfs.writable.dataschema;
    +    
    +    import org.apache.hadoop.io.*;
    +import java.io.DataInput;
    +import java.io.DataOutput;
    +import java.io.IOException;
    +import java.lang.reflect.Field;
    +
    +    /**
    + * PxfExample_CustomWritable class - used to serialize and deserialize data with
    + * text, int, and float data types
    + */
    +public class PxfExample_CustomWritable implements Writable {
    +
    +    public String st1, st2;
    +    public int int1;
    +    public float ft;
    +
    +    public PxfExample_CustomWritable() {
    +        st1 = new String("");
    +        st2 = new String("");
    +        int1 = 0;
    +        ft = 0.f;
    +    }
    +
    +    public PxfExample_CustomWritable(int i1, int i2, int i3) {
    +
    +        st1 = new String("short_string___" + i1);
    +        st2 = new String("short_string___" + i1);
    +        int1 = i2;
    +        ft = i1 * 10.f * 2.3f;
    +
    +    }
    +
    +    String GetSt1() {
    +        return st1;
    +    }
    +
    +    String GetSt2() {
    +        return st2;
    +    }
    +
    +    int GetInt1() {
    +        return int1;
    +    }
    +    
    +    float GetFt() {
    +        return ft;
    +    }
    +
    +    @Override
    +    public void write(DataOutput out) throws IOException {
    +
    +        Text txt = new Text();
    +        txt.set(st1);
    +        txt.write(out);
    +        txt.set(st2);
    +        txt.write(out);
    +
    +        IntWritable intw = new IntWritable();
    +        intw.set(int1);
    +        intw.write(out);
    +
    +        FloatWritable fw = new FloatWritable();
    +        fw.set(ft);
    +        fw.write(out);
    +    }
    +
    +    @Override
    +    public void readFields(DataInput in) throws IOException {
    +
    +        Text txt = new Text();
    +        txt.readFields(in);
    +        st1 = txt.toString();
    +        txt.readFields(in);
    +        st2 = txt.toString();
    +
    +        IntWritable intw = new IntWritable();
    +        intw.readFields(in);
    +        int1 = intw.get();
    +
    +        FloatWritable fw = new FloatWritable();
    +        fw.readFields(in);
    +        ft = fw.get();
    +    }
    +
    +    public void printFieldTypes() {
    +        Class myClass = this.getClass();
    +        Field[] fields = myClass.getDeclaredFields();
    +
    +        for (int i = 0; i < fields.length; i++) {
    +            System.out.println(fields[i].getType().getName());
    +        }
    +    }
    +}
    +    ```
    +
    +3. Compile and create a Java class jar file for `PxfExample_CustomWritable`:
    +
    +    ``` shell
    +    $ javac -classpath /usr/hdp/2.5.0.0-1245/hadoop/hadoop-common.jar  PxfExample_CustomWritable.java
    +    $ cd ../../../../../../../
    +    $ jar cf pxfex-customwritable.jar com
    +    $ cp pxfex-customwritable.jar /tmp/
    +    ```
    +
    +4. Include the new jar file in the PXF Agent class path by updating the `pxf-public.classpath`. If you use Ambari to manage your cluster, add the following line via the Ambari UI and restart the PXF Agent:
    +
    +    ``` pre
    +    /tmp/pxfex-customwritable.jar
    +    ```
    +
    +    If you have a command-line-managed HAWQ cluster, perform the following steps on each node in your HAWQ cluster:
    +   - Directly edit `/etc/pxf/conf/pxf-public.classpath` and add the line above.
    +   - Restart the PXF Agent:
    +
    +        ``` shell
    +        $ sudo service pxf-service restart
    +        ```
    +
    +5. Use the PXF `SequenceWritable` profile to create a writable HAWQ external table. Identify the serialization/deserialization Java class you created above in the `DATA-SCHEMA` \<custom-option\>. Use `BLOCK` mode compression with `BZip2` when creating the writable table.
    +
    +    ``` sql
    +    gpadmin=# CREATE WRITABLE EXTERNAL TABLE pxf_tbl_seqwrit (location text, month text, number_of_orders integer, total_sales real)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_seqwrit_file?PROFILE=SequenceWritable&DATA-SCHEMA=com.hawq.example.pxf.hdfs.writable.dataschema.PxfExample_CustomWritable&COMPRESSION_TYPE=BLOCK&COMPRESSION_CODEC=org.apache.hadoop.io.compress.BZip2Codec')
    +              FORMAT 'CUSTOM' (formatter='pxfwritable_export');
    +    ```
    +
    +    Notice that the `'CUSTOM'` `FORMAT` \<formatting-properties\> specify the built-in `pxfwritable_export` formatter.
    +
    +6. Insert some data into `pxf_tbl_seqwrit`:
    +
    +    ``` sql
    +    gpadmin=# INSERT INTO pxf_tbl_seqwrit VALUES ( 'Frankfurt', 'Mar', 777, 3956.98 );
    +    gpadmin=# INSERT INTO pxf_tbl_seqwrit VALUES ( 'Cleveland', 'Oct', 3812, 96645.37 );
    +    ```
    +
    +6. Recall that querying an external writable table is not supported in HAWQ. To read the newly-created writable table, create a HAWQ readable external table referencing the writable table's HDFS file:
    +
    +    ``` sql
    +    gpadmin=# CREATE EXTERNAL TABLE read_pxf_tbl_seqwrit (location text, month text, number_of_orders integer, total_sales real)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_seqwrit_file?PROFILE=SequenceWritable&DATA-SCHEMA=com.hawq.example.pxf.hdfs.writable.dataschema.PxfExample_CustomWritable')
    +              FORMAT 'CUSTOM' (formatter='pxfwritable_import');
    +    ```
    +    The `DATA-SCHEMA` \<custom-option\> must be specified when reading an HDFS file via the `SequenceWritable` profile. Compression-related options need not be provided.
    +    
    +6. Query the readable external table `read_pxf_tbl_seqwrit`:
    +
    +    ``` sql
    +    gpadmin=# SELECT * FROM read_pxf_tbl_seqwrit;
    +    ```
    +    
    +    ``` pre
    +     location  | month | number_of_orders | total_sales 
    +    -----------+-------+------------------+-------------
    +     Frankfurt | Mar   |              777 |     3956.98
    +     Cleveland | Oct   |             3812 |     96645.4
    +    (2 rows)
    +    ```
    +
    +7. To read the `pxf_seqwrit_file` HDFS file via Hive:
    +
    +    IS THIS A VALID USE CASE???  COULD NOT GET THIS TO WORK.
    +
    +## <a id="recordkeyinkey-valuefileformats"></a>Reading the Record Key 
    +
    +Sequence file and other file formats that store rows in a key-value format can access the key value through HAWQ by using the `recordkey` keyword as a field name.
    +
    +The field type of `recordkey` must correspond to the key type, much as the other fields must match the HDFS data.�
    +
    +`recordkey` can be any of the following Hadoop types:
    +
    +-   BooleanWritable
    +-   ByteWritable
    +-   DoubleWritable
    +-   FloatWritable
    +-   IntWritable
    +-   LongWritable
    +-   Text
    +
    +### <a id="example1"></a>Example: Using Record Keys
    +
    +Create an external readable table to access the record keys from the writable table `XXX` we created in SequenceFile format in the section above.  The `recordkey` is of type `int8`. (We did not identify a record key when creating the writable table.
    +
    +``` sql
    +gpadmin=# CREATE  EXTERNAL TABLE read_pxf_tbl_seqwrit_RECKEY (recordkey int8, location text, month text, number_of_orders integer, total_sales real)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_seqwrit_file?PROFILE=SequenceWritable&DATA-SCHEMA=com.hawq.example.pxf.hdfs.writable.dataschema.PxfExample_CustomWritable')
    +          FORMAT 'CUSTOM' (formatter='pxfwritable_import');
    +gpadmin=# SELECT * FROM read_pxf_tbl_seqwrit_RECKEY;
    +```
    +
    +``` pre
    + recordkey |  location   | month | number_of_orders | total_sales 
    +-----------+-------------+-------+------------------+-------------
    +         0 | Frankfurt   | Mar   |              777 |     3956.98
    +         0 | Cleveland   | Oct   |             3812 |     96645.4
    +(2 rows)
    +```
    +
    +The `recordkey` is display as `0`; a record key was not identified when you inserted entries in the writable table.
    --- End diff --
    
    I'm guessing this example is a placeholder, then?  Are you planning to change the previous example so that you can show querying the recordkeys?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by dyozie <gi...@git.apache.org>.
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r86049010
  
    --- Diff: pxf/HDFSWritablePXF.html.md.erb ---
    @@ -0,0 +1,416 @@
    +---
    +title: Writing Data to HDFS
    +---
    +
    +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable table to export data from a HAWQ internal table to binary or text HDFS files.
    +
    +Use the `HdfsTextSimple` profile when writing text data. Use the `SequenceWritable` profile when dealing with binary data.
    +
    +This section describes how to use these PXF profiles to create writable external tables.
    +
    +**Note**: Tables that you create with writable profiles can only be used for INSERT operations.  If you want to query inserted data, you must define a separate external readable table that references the new HDFS file using the equivalent readable profile.  ??You can also create a Hive table to access the HDFS file.??
    +
    +## <a id="pxfwrite_prereq"></a>Prerequisites
    +
    +Before working with HDFS file data using HAWQ and PXF, ensure that:
    +
    +-   The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
    +-   All HDFS users have read permissions to HDFS services.
    +-   HDFS write permissions are provided to a restricted set of users.
    +
    +## <a id="hdfsplugin_writeextdata"></a>Writing to PXF External Tables
    +The PXF HDFS plug-in supports two writable profiles: `HdfsTextSimple` and `SequenceWritable`.
    +
    +Use the following syntax to create a HAWQ external writable table representing HDFS data:�
    +
    +``` sql
    +CREATE WRITABLE EXTERNAL TABLE <table_name> 
    +    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
    +LOCATION ('pxf://<host>[:<port>]/<path-to-hdfs-file>
    +    ?PROFILE=HdfsTextSimple|SequenceWritable[&<custom-option>=<value>[...]]')
    +FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
    +```
    +
    +HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the table below.
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| \<host\>[:\<port\>]    | The HDFS NameNode and port. |
    +| \<path-to-hdfs-file\>    | The path to the file in the HDFS data store. |
    +| PROFILE    | The `PROFILE` keyword must specify one of the values `HdfsTextSimple` or `SequenceWritable`. |
    +| \<custom-option\>  | \<custom-option\> is profile-specific. These options are discussed in the next topic.|
    +| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile to create a plain-text-delimited file at the location specified by \<path-to-hdfs-file\>. The `HdfsTextSimple` '`TEXT`' `FORMAT` supports only the built-in `(delimiter=<delim>)` \<formatting-property\>. |
    +| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with the `HdfsTextSimple` profile to create a comma-separated-value file at the location specified by \<path-to-hdfs-file\>.  |
    +| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the `SequenceWritable` profile. The `SequenceWritable` '`CUSTOM`' `FORMAT` supports only the built-in `(formatter='pxfwritable_export)` (write) and `(formatter='pxfwritable_import)` (read) \<formatting-properties\>.
    +
    +**Note**: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification.
    +
    +## <a id="profile_hdfstextsimple"></a>Custom Options
    +
    +The `HdfsTextSimple` and `SequenceWritable` profiles support the following custom options:
    +
    +| Option  | Value Description | Profile |
    +|-------|-------------------------------------|--------|
    +| COMPRESSION_CODEC    | The compression codec Java class name. If this option is not provided, no data compression is performed. Supported compression codecs include: `org.apache.hadoop.io.compress.DefaultCodec` and `org.apache.hadoop.io.compress.BZip2Codec` | HdfsTextSimple, SequenceWritable |
    +|    |  `org.apache.hadoop.io.compress.GzipCodec` | HdfsTextSimple |
    +| COMPRESSION_TYPE    | The compression type to employ; supported values are `RECORD` (the default) or `BLOCK`. | HdfsTextSimple, SequenceWritable |
    +| DATA-SCHEMA    | The name of the writer serialization/deserialization class. The jar file in which this class resides must be in the PXF class path. This option is required for the `SequenceWritable` profile and has no default value. | SequenceWritable|
    +| THREAD-SAFE | Boolean value determining if a table query can run in multi-threaded mode. The default value is `TRUE`. Set this option to `FALSE` to handle all requests in a single thread for operations that are not thread-safe (for example, compression). | HdfsTextSimple, SequenceWritable| 
    +
    +## <a id="profile_hdfstextsimple"></a>HdfsTextSimple Profile
    +
    +Use the `HdfsTextSimple` profile when writing delimited data to a plain text file where each row is a single record.
    +
    +Writable tables created using the `HdfsTextSimple` profile can optionally use record or block compression. The following compression codecs are supported:
    +
    +- org.apache.hadoop.io.compress.DefaultCodec
    +- org.apache.hadoop.io.compress.GzipCodec
    +- org.apache.hadoop.io.compress.BZip2Codec
    +
    +The `HdfsTextSimple` profile supports the following \<formatting-properties\>:
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| delimiter    | The delimiter character to use when writing the file. Default value is a comma `,`.|
    +
    +
    +### <a id="profile_hdfstextsimple_writing"></a>Example: Writing Data Using the HdfsTextSimple Profile
    +
    +This example uses the data schema introduced in [Example: Using the HdfsTextSimple Profile] (HDFSFileDataPXF.html#profile_hdfstextsimple_query):
    +
    +
    +| Field Name  | Data Type |
    +|-------|-------------------------------------|
    +| location | text |
    +| month | text |
    +| number\_of\_orders | int |
    +| total\_sales | float8 |
    +
    +The example also uses the HAWQ table `pxf_hdfs_textsimple` created in that exercise and expects it to exist.
    +
    +
    +Perform the following operations to use the PXF `HdfsTextSimple` profile to create a HAWQ writable external table with the same data schema as defined above. You will also create a separate external readable table to read the associated HDFS file.
    +
    +1. Create a writable HAWQ external table with the data schema described above. Write to the HDFS file `/data/pxf_examples/pxfwritable_hdfs_textsimple1`. Create the table specifying a comma `,` as the delimiter:
    +
    +    ``` sql
    +    gpadmin=# CREATE WRITABLE EXTERNAL TABLE pxf_hdfs_writabletbl_1(location text, month text, num_orders int, total_sales float8)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxfwritable_hdfs_textsimple1?PROFILE=HdfsTextSimple')
    +              FORMAT 'TEXT' (delimiter=E',');
    +    ```
    +    
    +    The `FORMAT` subclause `delimiter` value is specified as the single ascii comma character `,`. `E` escapes the character.
    +
    +2. Write a few records to the `pxfwritable_hdfs_textsimple1` HDFS file by invoking the SQL `INSERT` command on `pxf_hdfs_writabletbl_1`:
    +
    +    ``` sql
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_1 VALUES ( 'Frankfurt', 'Mar', 777, 3956.98 );
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_1 VALUES ( 'Cleveland', 'Oct', 3812, 96645.37 );
    +    ```
    +
    +3. Insert the contents of the `pxf_hdfs_textsimple` table created in [Example: Using the HdfsTextSimple Profile] (HDFSFileDataPXF.html#profile_hdfstextsimple_query) into `pxf_hdfs_writabletbl_1`:
    +
    +    ``` sql
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_1 SELECT * FROM pxf_hdfs_textsimple;
    +    ```
    +
    +4. View the file contents in HDFS:
    +
    +    ``` shell
    +    $ hdfs dfs -cat /data/pxf_examples/pxfwritable_hdfs_textsimple1/*
    +    Frankfurt,Mar,777,3956.98
    +    Cleveland,Oct,3812,96645.37
    +    Prague,Jan,101,4875.33
    +    Rome,Mar,87,1557.39
    +    Bangalore,May,317,8936.99
    +    Beijing,Jul,411,11600.67
    +    ```
    +
    +    Because you specified comma `,` as the delimiter, this character is the field separator used in each record of the HDFS file.
    +
    +5. Querying an external writable table is not supported in HAWQ. To query data from the newly-created HDFS file, create a readable external HAWQ table referencing the HDFS file:
    +
    +    ``` sql
    +    gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple_r1(location text, month text, num_orders int, total_sales float8)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxfwritable_hdfs_textsimple1?PROFILE=HdfsTextSimple')
    +			    FORMAT 'CSV';
    +    ```
    +
    +    The table is created with the `'CSV'` `FORMAT` because the delimiter character used when creating the writable table was a comma `,`.
    +
    +6. Query the readable external table `pxf_hdfs_textsimple_r1`:
    +
    +    ``` sql
    +    gpadmin=# SELECT * FROM pxf_hdfs_textsimple_r1;
    +    ```
    +
    +    ``` pre
    +     location  | month | num_orders | total_sales
    +    -----------+-------+------------+-------------
    +      Frankfurt | Mar   |        777 |     3956.98
    +      Cleveland | Oct   |       3812 |    96645.37
    +      Prague    | Jan   |        101 |     4875.33
    +      Rome      | Mar   |         87 |     1557.39
    +      Bangalore | May   |        317 |     8936.99
    +      Beijing   | Jul   |        411 |    11600.67
    +    (6 rows)
    +    ```
    +
    +    The table includes the records you individually inserted, as well as the full contents of the `pxf_hdfs_textsimple` table.
    +
    +7. Create a second HAWQ external writable table, this time using Gzip compression and employing a colon `:` as the delimiter:
    +
    +    ``` sql
    +    gpadmin=# CREATE WRITABLE EXTERNAL TABLE pxf_hdfs_writabletbl_2 (location text, month text, num_orders int, total_sales float8)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxfwritable_hdfs_textsimple2?PROFILE=HdfsTextSimple&COMPRESSION_CODEC=org.apache.hadoop.io.compress.GzipCodec')
    +              FORMAT 'TEXT' (delimiter=E':');
    +    ```
    +
    +8. Write a few records to the `pxfwritable_hdfs_textsimple2` HDFS file by inserting into the `pxf_hdfs_writabletbl_2` table:
    +
    +    ``` sql
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_2 VALUES ( 'Frankfurt', 'Mar', 777, 3956.98 );
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_2 VALUES ( 'Cleveland', 'Oct', 3812, 96645.37 );
    +    ```
    +
    +9. View the file contents in HDFS; use the `-text` option to `hdfs dfs` to view the compressed data as text:
    +
    +    ``` shell
    +    $ hdfs dfs -text /data/pxf_examples/pxfwritable_hdfs_textsimple2/*
    +    Frankfurt:Mar:777:3956.98
    +    Cleveland:Oct:3812:96645.3
    +    ```
    +
    +    Notice that the colon `:` is the field separator in the HDFS file.
    +
    +    As described in Step 5 above, to query data from the newly-created HDFS file named `pxfwritable_hdfs_textsimple2`, you can create a readable external HAWQ table referencing this HDFS file.
    +
    +
    +## <a id="profile_hdfsseqwritable_using"></a>SequenceWritable Profile
    +
    +Use the HDFS plug-in `SequenceWritable` profile when writing SequenceFile format files. Files of this type consist of binary key/value pairs. Sequence files are a common data transfer format between MapReduce jobs. 
    +
    +SequenceFile format files can optionally use record or block compression. The following compression codecs are supported:
    +
    +- org.apache.hadoop.io.compress.DefaultCodec
    +- org.apache.hadoop.io.compress.BZip2Codec
    +
    +When using the `SequenceWritable` profile to write a SequenceFile format file, you must provide the name of the Java class used to serialize/deserialize the data. This class will provide custom read and write methods for the fields in the schema associated with the data you will be writing.
    --- End diff --
    
    Some edits:  you must provide the name of a Java class to use for serializing/deserializing the data.  This class must provide read and write methods for the fields in the schema associated with the table data.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by dyozie <gi...@git.apache.org>.
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r85845000
  
    --- Diff: pxf/HDFSWritablePXF.html.md.erb ---
    @@ -0,0 +1,410 @@
    +---
    +title: Writing Data to HDFS
    +---
    +
    +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable table to export data from a HAWQ internal table to HDFS.
    +
    +This section describes how to use these PXF profiles to create writable external tables.
    +
    +**Note**: You cannot directly query data in a HAWQ writable table.  After creating the external writable table, you must create a HAWQ readable external table accessing the HDFS file, then query that table. ??You can also create a Hive table to access the HDFS file.??
    +
    +## <a id="pxfwrite_prereq"></a>Prerequisites
    +
    +Before working with HDFS file data using HAWQ and PXF, ensure that:
    +
    +-   The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
    +-   All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users.
    +
    +## <a id="hdfsplugin_writeextdata"></a>Writing to PXF External Tables
    +The PXF HDFS plug-in supports writable two profiles: `HdfsTextSimple` and `SequenceWritable`.
    +
    +Use the following syntax to create a HAWQ external writable table representing HDFS data:�
    +
    +``` sql
    +CREATE EXTERNAL WRITABLE TABLE <table_name> 
    +    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
    +LOCATION ('pxf://<host>[:<port>]/<path-to-hdfs-file>
    +    ?PROFILE=HdfsTextSimple|SequenceWritable[&<custom-option>=<value>[...]]')
    +FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
    +```
    +
    +HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the table below.
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| \<host\>[:\<port\>]    | The HDFS NameNode and port. |
    +| \<path-to-hdfs-file\>    | The path to the file in the HDFS data store. |
    +| PROFILE    | The `PROFILE` keyword must specify one of the values `HdfsTextSimple` or `SequenceWritable`. |
    +| \<custom-option\>  | \<custom-option\> is profile-specific. These options are discussed in the next topic.|
    +| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile when \<path-to-hdfs-file\> will reference a plain text delimited file. The `HdfsTextSimple` '`TEXT`' `FORMAT` supports only the built-in `(delimiter=<delim>)` \<formatting-property\>. |
    +| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with `HdfsTextSimple` when \<path-to-hdfs-file\> will reference a comma-separated value file.  |
    +| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the `SequenceWritable` profile. The `SequenceWritable` '`CUSTOM`' `FORMAT` supports only the built-in `(formatter='pxfwritable_export)` (write) and `(formatter='pxfwritable_import)` (read) \<formatting-properties\>.
    +
    +**Note**: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification.
    +
    +## <a id="profile_hdfstextsimple"></a>Custom Options
    +
    +The `HdfsTextSimple` and `SequenceWritable` profiles support the following \<custom-options\>:
    +
    +| Keyword  | Value Description |
    +|-------|-------------------------------------|
    +| COMPRESSION_CODEC    | The compression codec Java class name. If this option is not provided, no data compression is performed. Supported compression codecs include: `org.apache.hadoop.io.compress.DefaultCodec`, `org.apache.hadoop.io.compress.BZip2Codec`, and `org.apache.hadoop.io.compress.GzipCodec` (`HdfsTextSimple` profile only) |
    +| COMPRESSION_TYPE    | The compression type to employ; supported values are `RECORD` (the default) or `BLOCK`. |
    +| DATA-SCHEMA    | (`SequenceWritable` profile only) The name of the writer serialization/deserialization class. The jar file in which this class resides must be in the PXF class path. This option has no default value. |
    +| THREAD-SAFE | Boolean value determining if a table query can run in multi-thread mode. Default value is `TRUE`, requests run in multi-threaded mode. When set to `FALSE`, requests will be handled in a single thread.  `THREAD-SAFE` should be set appropriately when operations that are not thread-safe are performed (i.e. compression). |
    +
    +## <a id="profile_hdfstextsimple"></a>HdfsTextSimple Profile
    +
    +Use the `HdfsTextSimple` profile when writing delimited data to a plain text file where each row is a single record.
    +
    +Writable tables created using the `HdfsTextSimple` profile can use no, record, or block compression. When compression is used, the default, gzip, and bzip2 Hadoop compression codecs are supported:
    +
    +- org.apache.hadoop.io.compress.DefaultCodec
    +- org.apache.hadoop.io.compress.GzipCodec
    +- org.apache.hadoop.io.compress.BZip2Codec
    +
    +\<formatting-properties\> supported by the `HdfsTextSimple` profile include:
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| delimiter    | The delimiter character to use when writing the file. Default value is a comma `,`.|
    +
    +
    +### <a id="profile_hdfstextsimple_writing"></a>Example: Writing Using the HdfsTextSimple Profile
    +
    +This example uses the data schema introduced in [Example: Using the HdfsTextSimple Profile] (HDFSFileDataPXF.html#profile_hdfstextsimple_query):
    +
    +
    +| Field Name  | Data Type |
    +|-------|-------------------------------------|
    +| location | text |
    +| month | text |
    +| number\_of\_orders | int |
    +| total\_sales | float8 |
    +
    +
    +Perform the following operations to use the PXF `HdfsTextSimple` profile to create a HAWQ writable external table with the same data schema as defined above. You will also create a separate external readable table to read the associated HDFS file.
    +
    +1. Create a writable HAWQ external table with the data schema described above. Write to the HDFS file `/data/pxf_examples/pxfwritable_hdfs_textsimple1`. Create the table specifying a comma `,` as the delimiter:
    +
    +    ``` sql
    +    gpadmin=# CREATE WRITABLE EXTERNAL TABLE pxf_hdfs_writabletbl_1(location text, month text, num_orders int, total_sales float8)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxfwritable_hdfs_textsimple1?PROFILE=HdfsTextSimple')
    +              FORMAT 'TEXT' (delimiter=E',');
    +    ```
    +    
    +    The `FORMAT` subclause `delimiter` value is specified as the single ascii comma character ','. `E` escapes the character.
    --- End diff --
    
    change ',' to (,) or `,` for readability


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by dyozie <gi...@git.apache.org>.
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r85790837
  
    --- Diff: pxf/HDFSWritablePXF.html.md.erb ---
    @@ -0,0 +1,410 @@
    +---
    +title: Writing Data to HDFS
    +---
    +
    +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable table to export data from a HAWQ internal table to HDFS.
    +
    +This section describes how to use these PXF profiles to create writable external tables.
    +
    +**Note**: You cannot directly query data in a HAWQ writable table.  After creating the external writable table, you must create a HAWQ readable external table accessing the HDFS file, then query that table. ??You can also create a Hive table to access the HDFS file.??
    --- End diff --
    
    This note could use some editing.  Something like:  Tables that you create with the SequenceWritable profile can only be used for INSERT operations. If you want to query the inserted data, you must define a separate external table that references the new HDFS file, using the ???? or ??? profiles.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by dyozie <gi...@git.apache.org>.
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r85813812
  
    --- Diff: pxf/HDFSWritablePXF.html.md.erb ---
    @@ -0,0 +1,410 @@
    +---
    +title: Writing Data to HDFS
    +---
    +
    +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable table to export data from a HAWQ internal table to HDFS.
    +
    +This section describes how to use these PXF profiles to create writable external tables.
    +
    +**Note**: You cannot directly query data in a HAWQ writable table.  After creating the external writable table, you must create a HAWQ readable external table accessing the HDFS file, then query that table. ??You can also create a Hive table to access the HDFS file.??
    +
    +## <a id="pxfwrite_prereq"></a>Prerequisites
    +
    +Before working with HDFS file data using HAWQ and PXF, ensure that:
    +
    +-   The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
    +-   All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users.
    +
    +## <a id="hdfsplugin_writeextdata"></a>Writing to PXF External Tables
    +The PXF HDFS plug-in supports writable two profiles: `HdfsTextSimple` and `SequenceWritable`.
    +
    +Use the following syntax to create a HAWQ external writable table representing HDFS data:�
    +
    +``` sql
    +CREATE EXTERNAL WRITABLE TABLE <table_name> 
    +    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
    +LOCATION ('pxf://<host>[:<port>]/<path-to-hdfs-file>
    +    ?PROFILE=HdfsTextSimple|SequenceWritable[&<custom-option>=<value>[...]]')
    +FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
    +```
    +
    +HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the table below.
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| \<host\>[:\<port\>]    | The HDFS NameNode and port. |
    +| \<path-to-hdfs-file\>    | The path to the file in the HDFS data store. |
    +| PROFILE    | The `PROFILE` keyword must specify one of the values `HdfsTextSimple` or `SequenceWritable`. |
    +| \<custom-option\>  | \<custom-option\> is profile-specific. These options are discussed in the next topic.|
    +| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile when \<path-to-hdfs-file\> will reference a plain text delimited file. The `HdfsTextSimple` '`TEXT`' `FORMAT` supports only the built-in `(delimiter=<delim>)` \<formatting-property\>. |
    +| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with `HdfsTextSimple` when \<path-to-hdfs-file\> will reference a comma-separated value file.  |
    +| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the `SequenceWritable` profile. The `SequenceWritable` '`CUSTOM`' `FORMAT` supports only the built-in `(formatter='pxfwritable_export)` (write) and `(formatter='pxfwritable_import)` (read) \<formatting-properties\>.
    +
    +**Note**: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification.
    +
    +## <a id="profile_hdfstextsimple"></a>Custom Options
    +
    +The `HdfsTextSimple` and `SequenceWritable` profiles support the following \<custom-options\>:
    +
    +| Keyword  | Value Description |
    +|-------|-------------------------------------|
    +| COMPRESSION_CODEC    | The compression codec Java class name. If this option is not provided, no data compression is performed. Supported compression codecs include: `org.apache.hadoop.io.compress.DefaultCodec`, `org.apache.hadoop.io.compress.BZip2Codec`, and `org.apache.hadoop.io.compress.GzipCodec` (`HdfsTextSimple` profile only) |
    +| COMPRESSION_TYPE    | The compression type to employ; supported values are `RECORD` (the default) or `BLOCK`. |
    +| DATA-SCHEMA    | (`SequenceWritable` profile only) The name of the writer serialization/deserialization class. The jar file in which this class resides must be in the PXF class path. This option has no default value. |
    +| THREAD-SAFE | Boolean value determining if a table query can run in multi-thread mode. Default value is `TRUE`, requests run in multi-threaded mode. When set to `FALSE`, requests will be handled in a single thread.  `THREAD-SAFE` should be set appropriately when operations that are not thread-safe are performed (i.e. compression). |
    +
    +## <a id="profile_hdfstextsimple"></a>HdfsTextSimple Profile
    +
    +Use the `HdfsTextSimple` profile when writing delimited data to a plain text file where each row is a single record.
    +
    +Writable tables created using the `HdfsTextSimple` profile can use no, record, or block compression. When compression is used, the default, gzip, and bzip2 Hadoop compression codecs are supported:
    +
    +- org.apache.hadoop.io.compress.DefaultCodec
    +- org.apache.hadoop.io.compress.GzipCodec
    +- org.apache.hadoop.io.compress.BZip2Codec
    +
    +\<formatting-properties\> supported by the `HdfsTextSimple` profile include:
    --- End diff --
    
    Edit for passive voice


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by lisakowen <gi...@git.apache.org>.
Github user lisakowen commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r96705982
  
    --- Diff: markdown/pxf/HDFSWritablePXF.html.md.erb ---
    @@ -396,7 +394,7 @@ The field type of `recordkey` must correspond to the key type, much as the other
     
     ### <a id="example1"></a>Example: Using Record Keys
     
    -Create an external readable table to access the record keys from the writable table `pxf_tbl_seqwrit` that you created in [Example: Writing Data Using the SequenceWritable Profile](#profile_hdfsseqwritable_writing).  The `recordkey` is of type `int8`. (We did not identify a record key when creating the writable table.
    +Create an external readable table to access the record keys from the writable table `pxf_tbl_seqwrit` that you created in [Example: Writing Data Using the SequenceWritable Profile](#profile_hdfsseqwritable_writing).  The `recordkey` is of type `int8`. (You did not identify a record key when you created the writable table.
    --- End diff --
    
    those two sentences are not directly related.  the type of recordkey will always be int8.  i am going to just remove the "You did not identify a record key..." sentence as a variant is used after the example.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by kavinderd <gi...@git.apache.org>.
Github user kavinderd commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r96086612
  
    --- Diff: markdown/pxf/HDFSWritablePXF.html.md.erb ---
    @@ -0,0 +1,416 @@
    +---
    +title: Writing Data to HDFS
    +---
    +
    +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable table to export data from a HAWQ internal table to binary or text HDFS files.
    +
    +Use the `HdfsTextSimple` profile when writing text data. Use the `SequenceWritable` profile when dealing with binary data.
    +
    +This section describes how to use these PXF profiles to create writable external tables.
    +
    +**Note**: Tables that you create with writable profiles can only be used for INSERT operations.  If you want to query inserted data, you must define a separate external readable table that references the new HDFS file using the equivalent readable profile.  ??You can also create a Hive table to access the HDFS file.??
    +
    +## <a id="pxfwrite_prereq"></a>Prerequisites
    +
    +Before working with HDFS file data using HAWQ and PXF, ensure that:
    +
    +-   The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
    +-   All HDFS users have read permissions to HDFS services.
    +-   HDFS write permissions are provided to a restricted set of users.
    +
    +## <a id="hdfsplugin_writeextdata"></a>Writing to PXF External Tables
    +The PXF HDFS plug-in supports two writable profiles: `HdfsTextSimple` and `SequenceWritable`.
    +
    +Use the following syntax to create a HAWQ external writable table representing HDFS data:�
    +
    +``` sql
    +CREATE WRITABLE EXTERNAL TABLE <table_name> 
    +    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
    +LOCATION ('pxf://<host>[:<port>]/<path-to-hdfs-file>
    +    ?PROFILE=HdfsTextSimple|SequenceWritable[&<custom-option>=<value>[...]]')
    +FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
    +```
    +
    +HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the table below.
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| \<host\>[:\<port\>]    | The HDFS NameNode and port. |
    +| \<path-to-hdfs-file\>    | The path to the file in the HDFS data store. |
    +| PROFILE    | The `PROFILE` keyword must specify one of the values `HdfsTextSimple` or `SequenceWritable`. |
    +| \<custom-option\>  | \<custom-option\> is profile-specific. These options are discussed in the next topic.|
    +| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile to create a plain-text-delimited file at the location specified by \<path-to-hdfs-file\>. The `HdfsTextSimple` '`TEXT`' `FORMAT` supports only the built-in `(delimiter=<delim>)` \<formatting-property\>. |
    +| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with the `HdfsTextSimple` profile to create a comma-separated-value file at the location specified by \<path-to-hdfs-file\>.  |
    +| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the `SequenceWritable` profile. The `SequenceWritable` '`CUSTOM`' `FORMAT` supports only the built-in `(formatter='pxfwritable_export)` (write) and `(formatter='pxfwritable_import)` (read) \<formatting-properties\>.
    +
    +**Note**: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification.
    +
    +## <a id="profile_hdfstextsimple"></a>Custom Options
    +
    +The `HdfsTextSimple` and `SequenceWritable` profiles support the following custom options:
    +
    +| Option  | Value Description | Profile |
    +|-------|-------------------------------------|--------|
    +| COMPRESSION_CODEC    | The compression codec Java class name. If this option is not provided, no data compression is performed. Supported compression codecs include: `org.apache.hadoop.io.compress.DefaultCodec` and `org.apache.hadoop.io.compress.BZip2Codec` | HdfsTextSimple, SequenceWritable |
    +|    |  `org.apache.hadoop.io.compress.GzipCodec` | HdfsTextSimple |
    +| COMPRESSION_TYPE    | The compression type to employ; supported values are `RECORD` (the default) or `BLOCK`. | HdfsTextSimple, SequenceWritable |
    +| DATA-SCHEMA    | The name of the writer serialization/deserialization class. The jar file in which this class resides must be in the PXF classpath. This option is required for the `SequenceWritable` profile and has no default value. | SequenceWritable|
    +| THREAD-SAFE | Boolean value determining if a table query can run in multi-threaded mode. The default value is `TRUE`. Set this option to `FALSE` to handle all requests in a single thread for operations that are not thread-safe (for example, compression). | HdfsTextSimple, SequenceWritable| 
    +
    +## <a id="profile_hdfstextsimple"></a>HdfsTextSimple Profile
    +
    +Use the `HdfsTextSimple` profile when writing delimited data to a plain text file where each row is a single record.
    --- End diff --
    
    I think it's more appropriate to say 'where each line is a single record'. 'row' is more of a database term than a text file one


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by dyozie <gi...@git.apache.org>.
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r86052363
  
    --- Diff: pxf/HDFSWritablePXF.html.md.erb ---
    @@ -0,0 +1,416 @@
    +---
    +title: Writing Data to HDFS
    +---
    +
    +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable table to export data from a HAWQ internal table to binary or text HDFS files.
    +
    +Use the `HdfsTextSimple` profile when writing text data. Use the `SequenceWritable` profile when dealing with binary data.
    +
    +This section describes how to use these PXF profiles to create writable external tables.
    +
    +**Note**: Tables that you create with writable profiles can only be used for INSERT operations.  If you want to query inserted data, you must define a separate external readable table that references the new HDFS file using the equivalent readable profile.  ??You can also create a Hive table to access the HDFS file.??
    +
    +## <a id="pxfwrite_prereq"></a>Prerequisites
    +
    +Before working with HDFS file data using HAWQ and PXF, ensure that:
    +
    +-   The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
    +-   All HDFS users have read permissions to HDFS services.
    +-   HDFS write permissions are provided to a restricted set of users.
    +
    +## <a id="hdfsplugin_writeextdata"></a>Writing to PXF External Tables
    +The PXF HDFS plug-in supports two writable profiles: `HdfsTextSimple` and `SequenceWritable`.
    +
    +Use the following syntax to create a HAWQ external writable table representing HDFS data:�
    +
    +``` sql
    +CREATE WRITABLE EXTERNAL TABLE <table_name> 
    +    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
    +LOCATION ('pxf://<host>[:<port>]/<path-to-hdfs-file>
    +    ?PROFILE=HdfsTextSimple|SequenceWritable[&<custom-option>=<value>[...]]')
    +FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
    +```
    +
    +HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the table below.
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| \<host\>[:\<port\>]    | The HDFS NameNode and port. |
    +| \<path-to-hdfs-file\>    | The path to the file in the HDFS data store. |
    +| PROFILE    | The `PROFILE` keyword must specify one of the values `HdfsTextSimple` or `SequenceWritable`. |
    +| \<custom-option\>  | \<custom-option\> is profile-specific. These options are discussed in the next topic.|
    +| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile to create a plain-text-delimited file at the location specified by \<path-to-hdfs-file\>. The `HdfsTextSimple` '`TEXT`' `FORMAT` supports only the built-in `(delimiter=<delim>)` \<formatting-property\>. |
    +| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with the `HdfsTextSimple` profile to create a comma-separated-value file at the location specified by \<path-to-hdfs-file\>.  |
    +| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the `SequenceWritable` profile. The `SequenceWritable` '`CUSTOM`' `FORMAT` supports only the built-in `(formatter='pxfwritable_export)` (write) and `(formatter='pxfwritable_import)` (read) \<formatting-properties\>.
    +
    +**Note**: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification.
    +
    +## <a id="profile_hdfstextsimple"></a>Custom Options
    +
    +The `HdfsTextSimple` and `SequenceWritable` profiles support the following custom options:
    +
    +| Option  | Value Description | Profile |
    +|-------|-------------------------------------|--------|
    +| COMPRESSION_CODEC    | The compression codec Java class name. If this option is not provided, no data compression is performed. Supported compression codecs include: `org.apache.hadoop.io.compress.DefaultCodec` and `org.apache.hadoop.io.compress.BZip2Codec` | HdfsTextSimple, SequenceWritable |
    +|    |  `org.apache.hadoop.io.compress.GzipCodec` | HdfsTextSimple |
    +| COMPRESSION_TYPE    | The compression type to employ; supported values are `RECORD` (the default) or `BLOCK`. | HdfsTextSimple, SequenceWritable |
    +| DATA-SCHEMA    | The name of the writer serialization/deserialization class. The jar file in which this class resides must be in the PXF class path. This option is required for the `SequenceWritable` profile and has no default value. | SequenceWritable|
    +| THREAD-SAFE | Boolean value determining if a table query can run in multi-threaded mode. The default value is `TRUE`. Set this option to `FALSE` to handle all requests in a single thread for operations that are not thread-safe (for example, compression). | HdfsTextSimple, SequenceWritable| 
    +
    +## <a id="profile_hdfstextsimple"></a>HdfsTextSimple Profile
    +
    +Use the `HdfsTextSimple` profile when writing delimited data to a plain text file where each row is a single record.
    +
    +Writable tables created using the `HdfsTextSimple` profile can optionally use record or block compression. The following compression codecs are supported:
    +
    +- org.apache.hadoop.io.compress.DefaultCodec
    +- org.apache.hadoop.io.compress.GzipCodec
    +- org.apache.hadoop.io.compress.BZip2Codec
    +
    +The `HdfsTextSimple` profile supports the following \<formatting-properties\>:
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| delimiter    | The delimiter character to use when writing the file. Default value is a comma `,`.|
    +
    +
    +### <a id="profile_hdfstextsimple_writing"></a>Example: Writing Data Using the HdfsTextSimple Profile
    +
    +This example uses the data schema introduced in [Example: Using the HdfsTextSimple Profile] (HDFSFileDataPXF.html#profile_hdfstextsimple_query):
    +
    +
    +| Field Name  | Data Type |
    +|-------|-------------------------------------|
    +| location | text |
    +| month | text |
    +| number\_of\_orders | int |
    +| total\_sales | float8 |
    +
    +The example also uses the HAWQ table `pxf_hdfs_textsimple` created in that exercise and expects it to exist.
    +
    +
    +Perform the following operations to use the PXF `HdfsTextSimple` profile to create a HAWQ writable external table with the same data schema as defined above. You will also create a separate external readable table to read the associated HDFS file.
    +
    +1. Create a writable HAWQ external table with the data schema described above. Write to the HDFS file `/data/pxf_examples/pxfwritable_hdfs_textsimple1`. Create the table specifying a comma `,` as the delimiter:
    +
    +    ``` sql
    +    gpadmin=# CREATE WRITABLE EXTERNAL TABLE pxf_hdfs_writabletbl_1(location text, month text, num_orders int, total_sales float8)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxfwritable_hdfs_textsimple1?PROFILE=HdfsTextSimple')
    +              FORMAT 'TEXT' (delimiter=E',');
    +    ```
    +    
    +    The `FORMAT` subclause `delimiter` value is specified as the single ascii comma character `,`. `E` escapes the character.
    +
    +2. Write a few records to the `pxfwritable_hdfs_textsimple1` HDFS file by invoking the SQL `INSERT` command on `pxf_hdfs_writabletbl_1`:
    +
    +    ``` sql
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_1 VALUES ( 'Frankfurt', 'Mar', 777, 3956.98 );
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_1 VALUES ( 'Cleveland', 'Oct', 3812, 96645.37 );
    +    ```
    +
    +3. Insert the contents of the `pxf_hdfs_textsimple` table created in [Example: Using the HdfsTextSimple Profile] (HDFSFileDataPXF.html#profile_hdfstextsimple_query) into `pxf_hdfs_writabletbl_1`:
    +
    +    ``` sql
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_1 SELECT * FROM pxf_hdfs_textsimple;
    +    ```
    +
    +4. View the file contents in HDFS:
    +
    +    ``` shell
    +    $ hdfs dfs -cat /data/pxf_examples/pxfwritable_hdfs_textsimple1/*
    +    Frankfurt,Mar,777,3956.98
    +    Cleveland,Oct,3812,96645.37
    +    Prague,Jan,101,4875.33
    +    Rome,Mar,87,1557.39
    +    Bangalore,May,317,8936.99
    +    Beijing,Jul,411,11600.67
    +    ```
    +
    +    Because you specified comma `,` as the delimiter, this character is the field separator used in each record of the HDFS file.
    +
    +5. Querying an external writable table is not supported in HAWQ. To query data from the newly-created HDFS file, create a readable external HAWQ table referencing the HDFS file:
    +
    +    ``` sql
    +    gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple_r1(location text, month text, num_orders int, total_sales float8)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxfwritable_hdfs_textsimple1?PROFILE=HdfsTextSimple')
    +			    FORMAT 'CSV';
    +    ```
    +
    +    The table is created with the `'CSV'` `FORMAT` because the delimiter character used when creating the writable table was a comma `,`.
    +
    +6. Query the readable external table `pxf_hdfs_textsimple_r1`:
    +
    +    ``` sql
    +    gpadmin=# SELECT * FROM pxf_hdfs_textsimple_r1;
    +    ```
    +
    +    ``` pre
    +     location  | month | num_orders | total_sales
    +    -----------+-------+------------+-------------
    +      Frankfurt | Mar   |        777 |     3956.98
    +      Cleveland | Oct   |       3812 |    96645.37
    +      Prague    | Jan   |        101 |     4875.33
    +      Rome      | Mar   |         87 |     1557.39
    +      Bangalore | May   |        317 |     8936.99
    +      Beijing   | Jul   |        411 |    11600.67
    +    (6 rows)
    +    ```
    +
    +    The table includes the records you individually inserted, as well as the full contents of the `pxf_hdfs_textsimple` table.
    +
    +7. Create a second HAWQ external writable table, this time using Gzip compression and employing a colon `:` as the delimiter:
    +
    +    ``` sql
    +    gpadmin=# CREATE WRITABLE EXTERNAL TABLE pxf_hdfs_writabletbl_2 (location text, month text, num_orders int, total_sales float8)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxfwritable_hdfs_textsimple2?PROFILE=HdfsTextSimple&COMPRESSION_CODEC=org.apache.hadoop.io.compress.GzipCodec')
    +              FORMAT 'TEXT' (delimiter=E':');
    +    ```
    +
    +8. Write a few records to the `pxfwritable_hdfs_textsimple2` HDFS file by inserting into the `pxf_hdfs_writabletbl_2` table:
    +
    +    ``` sql
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_2 VALUES ( 'Frankfurt', 'Mar', 777, 3956.98 );
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_2 VALUES ( 'Cleveland', 'Oct', 3812, 96645.37 );
    +    ```
    +
    +9. View the file contents in HDFS; use the `-text` option to `hdfs dfs` to view the compressed data as text:
    +
    +    ``` shell
    +    $ hdfs dfs -text /data/pxf_examples/pxfwritable_hdfs_textsimple2/*
    +    Frankfurt:Mar:777:3956.98
    +    Cleveland:Oct:3812:96645.3
    +    ```
    +
    +    Notice that the colon `:` is the field separator in the HDFS file.
    +
    +    As described in Step 5 above, to query data from the newly-created HDFS file named `pxfwritable_hdfs_textsimple2`, you can create a readable external HAWQ table referencing this HDFS file.
    +
    +
    +## <a id="profile_hdfsseqwritable_using"></a>SequenceWritable Profile
    +
    +Use the HDFS plug-in `SequenceWritable` profile when writing SequenceFile format files. Files of this type consist of binary key/value pairs. Sequence files are a common data transfer format between MapReduce jobs. 
    +
    +SequenceFile format files can optionally use record or block compression. The following compression codecs are supported:
    +
    +- org.apache.hadoop.io.compress.DefaultCodec
    +- org.apache.hadoop.io.compress.BZip2Codec
    +
    +When using the `SequenceWritable` profile to write a SequenceFile format file, you must provide the name of the Java class used to serialize/deserialize the data. This class will provide custom read and write methods for the fields in the schema associated with the data you will be writing.
    +
    +### <a id="profile_hdfsseqwritable_writing"></a>Example: Writing Using the SequenceWritable Profile
    +
    +Before creating a writable external table with the `SequenceWritable` profile, you will first create a Java class named `PxfExample_CustomWritable` that will serialize/deserialize the fields in the sample schema used in previous examples.
    +
    +Perform the following steps to create the Java class and writable table.
    +
    +1. Prepare to create the sample Java class:
    +
    +    ``` shell
    +    $ mkdir -p pxfex/com/hawq/example/pxf/hdfs/writable/dataschema
    +    $ cd pxfex/com/hawq/example/pxf/hdfs/writable/dataschema
    +    $ vi PxfExample_CustomWritable.java
    +    ```
    +
    +2. Copy and paste the following text into the `PxfExample_CustomWritable.java` file:
    +
    +    ``` java
    +    package com.hawq.example.pxf.hdfs.writable.dataschema;
    +    
    +    import org.apache.hadoop.io.*;
    +import java.io.DataInput;
    +import java.io.DataOutput;
    +import java.io.IOException;
    +import java.lang.reflect.Field;
    +
    +    /**
    + * PxfExample_CustomWritable class - used to serialize and deserialize data with
    + * text, int, and float data types
    + */
    +public class PxfExample_CustomWritable implements Writable {
    +
    +    public String st1, st2;
    +    public int int1;
    +    public float ft;
    +
    +    public PxfExample_CustomWritable() {
    +        st1 = new String("");
    +        st2 = new String("");
    +        int1 = 0;
    +        ft = 0.f;
    +    }
    +
    +    public PxfExample_CustomWritable(int i1, int i2, int i3) {
    +
    +        st1 = new String("short_string___" + i1);
    +        st2 = new String("short_string___" + i1);
    +        int1 = i2;
    +        ft = i1 * 10.f * 2.3f;
    +
    +    }
    +
    +    String GetSt1() {
    +        return st1;
    +    }
    +
    +    String GetSt2() {
    +        return st2;
    +    }
    +
    +    int GetInt1() {
    +        return int1;
    +    }
    +    
    +    float GetFt() {
    +        return ft;
    +    }
    +
    +    @Override
    +    public void write(DataOutput out) throws IOException {
    +
    +        Text txt = new Text();
    +        txt.set(st1);
    +        txt.write(out);
    +        txt.set(st2);
    +        txt.write(out);
    +
    +        IntWritable intw = new IntWritable();
    +        intw.set(int1);
    +        intw.write(out);
    +
    +        FloatWritable fw = new FloatWritable();
    +        fw.set(ft);
    +        fw.write(out);
    +    }
    +
    +    @Override
    +    public void readFields(DataInput in) throws IOException {
    +
    +        Text txt = new Text();
    +        txt.readFields(in);
    +        st1 = txt.toString();
    +        txt.readFields(in);
    +        st2 = txt.toString();
    +
    +        IntWritable intw = new IntWritable();
    +        intw.readFields(in);
    +        int1 = intw.get();
    +
    +        FloatWritable fw = new FloatWritable();
    +        fw.readFields(in);
    +        ft = fw.get();
    +    }
    +
    +    public void printFieldTypes() {
    +        Class myClass = this.getClass();
    +        Field[] fields = myClass.getDeclaredFields();
    +
    +        for (int i = 0; i < fields.length; i++) {
    +            System.out.println(fields[i].getType().getName());
    +        }
    +    }
    +}
    +    ```
    +
    +3. Compile and create a Java class jar file for `PxfExample_CustomWritable`:
    +
    +    ``` shell
    +    $ javac -classpath /usr/hdp/2.5.0.0-1245/hadoop/hadoop-common.jar  PxfExample_CustomWritable.java
    +    $ cd ../../../../../../../
    +    $ jar cf pxfex-customwritable.jar com
    +    $ cp pxfex-customwritable.jar /tmp/
    +    ```
    +
    +4. Include the new jar file in the PXF Agent class path by updating the `pxf-public.classpath`. If you use Ambari to manage your cluster, add the following line via the Ambari UI and restart the PXF Agent:
    +
    +    ``` pre
    +    /tmp/pxfex-customwritable.jar
    +    ```
    +
    +    If you have a command-line-managed HAWQ cluster, perform the following steps on each node in your HAWQ cluster:
    +   - Directly edit `/etc/pxf/conf/pxf-public.classpath` and add the line above.
    +   - Restart the PXF Agent:
    +
    +        ``` shell
    +        $ sudo service pxf-service restart
    +        ```
    +
    +5. Use the PXF `SequenceWritable` profile to create a writable HAWQ external table. Identify the serialization/deserialization Java class you created above in the `DATA-SCHEMA` \<custom-option\>. Use `BLOCK` mode compression with `BZip2` when creating the writable table.
    +
    +    ``` sql
    +    gpadmin=# CREATE WRITABLE EXTERNAL TABLE pxf_tbl_seqwrit (location text, month text, number_of_orders integer, total_sales real)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_seqwrit_file?PROFILE=SequenceWritable&DATA-SCHEMA=com.hawq.example.pxf.hdfs.writable.dataschema.PxfExample_CustomWritable&COMPRESSION_TYPE=BLOCK&COMPRESSION_CODEC=org.apache.hadoop.io.compress.BZip2Codec')
    +              FORMAT 'CUSTOM' (formatter='pxfwritable_export');
    +    ```
    +
    +    Notice that the `'CUSTOM'` `FORMAT` \<formatting-properties\> specify the built-in `pxfwritable_export` formatter.
    +
    +6. Insert some data into `pxf_tbl_seqwrit`:
    +
    +    ``` sql
    +    gpadmin=# INSERT INTO pxf_tbl_seqwrit VALUES ( 'Frankfurt', 'Mar', 777, 3956.98 );
    +    gpadmin=# INSERT INTO pxf_tbl_seqwrit VALUES ( 'Cleveland', 'Oct', 3812, 96645.37 );
    +    ```
    +
    +6. Recall that querying an external writable table is not supported in HAWQ. To read the newly-created writable table, create a HAWQ readable external table referencing the writable table's HDFS file:
    +
    +    ``` sql
    +    gpadmin=# CREATE EXTERNAL TABLE read_pxf_tbl_seqwrit (location text, month text, number_of_orders integer, total_sales real)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_seqwrit_file?PROFILE=SequenceWritable&DATA-SCHEMA=com.hawq.example.pxf.hdfs.writable.dataschema.PxfExample_CustomWritable')
    +              FORMAT 'CUSTOM' (formatter='pxfwritable_import');
    +    ```
    +    The `DATA-SCHEMA` \<custom-option\> must be specified when reading an HDFS file via the `SequenceWritable` profile. Compression-related options need not be provided.
    +    
    +6. Query the readable external table `read_pxf_tbl_seqwrit`:
    +
    +    ``` sql
    +    gpadmin=# SELECT * FROM read_pxf_tbl_seqwrit;
    +    ```
    +    
    +    ``` pre
    +     location  | month | number_of_orders | total_sales 
    +    -----------+-------+------------------+-------------
    +     Frankfurt | Mar   |              777 |     3956.98
    +     Cleveland | Oct   |             3812 |     96645.4
    +    (2 rows)
    +    ```
    +
    +7. To read the `pxf_seqwrit_file` HDFS file via Hive:
    +
    +    IS THIS A VALID USE CASE???  COULD NOT GET THIS TO WORK.
    +
    +## <a id="recordkeyinkey-valuefileformats"></a>Reading the Record Key 
    +
    +Sequence file and other file formats that store rows in a key-value format can access the key value through HAWQ by using the `recordkey` keyword as a field name.
    +
    +The field type of `recordkey` must correspond to the key type, much as the other fields must match the HDFS data.�
    +
    +`recordkey` can be any of the following Hadoop types:
    +
    +-   BooleanWritable
    +-   ByteWritable
    +-   DoubleWritable
    +-   FloatWritable
    +-   IntWritable
    +-   LongWritable
    +-   Text
    +
    +### <a id="example1"></a>Example: Using Record Keys
    +
    +Create an external readable table to access the record keys from the writable table `XXX` we created in SequenceFile format in the section above.  The `recordkey` is of type `int8`. (We did not identify a record key when creating the writable table.
    --- End diff --
    
    xxx -> read_pxf_tbl_seqwrit ?
    
    also change "we created in SequenceFile format in the section above" to "that you created in [section](url)"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by dyozie <gi...@git.apache.org>.
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r85794550
  
    --- Diff: pxf/HDFSWritablePXF.html.md.erb ---
    @@ -0,0 +1,410 @@
    +---
    +title: Writing Data to HDFS
    +---
    +
    +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable table to export data from a HAWQ internal table to HDFS.
    +
    +This section describes how to use these PXF profiles to create writable external tables.
    +
    +**Note**: You cannot directly query data in a HAWQ writable table.  After creating the external writable table, you must create a HAWQ readable external table accessing the HDFS file, then query that table. ??You can also create a Hive table to access the HDFS file.??
    +
    +## <a id="pxfwrite_prereq"></a>Prerequisites
    +
    +Before working with HDFS file data using HAWQ and PXF, ensure that:
    +
    +-   The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
    +-   All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users.
    +
    +## <a id="hdfsplugin_writeextdata"></a>Writing to PXF External Tables
    +The PXF HDFS plug-in supports writable two profiles: `HdfsTextSimple` and `SequenceWritable`.
    +
    +Use the following syntax to create a HAWQ external writable table representing HDFS data:�
    +
    +``` sql
    +CREATE EXTERNAL WRITABLE TABLE <table_name> 
    +    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
    +LOCATION ('pxf://<host>[:<port>]/<path-to-hdfs-file>
    +    ?PROFILE=HdfsTextSimple|SequenceWritable[&<custom-option>=<value>[...]]')
    +FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
    +```
    +
    +HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the table below.
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| \<host\>[:\<port\>]    | The HDFS NameNode and port. |
    +| \<path-to-hdfs-file\>    | The path to the file in the HDFS data store. |
    +| PROFILE    | The `PROFILE` keyword must specify one of the values `HdfsTextSimple` or `SequenceWritable`. |
    +| \<custom-option\>  | \<custom-option\> is profile-specific. These options are discussed in the next topic.|
    +| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile when \<path-to-hdfs-file\> will reference a plain text delimited file. The `HdfsTextSimple` '`TEXT`' `FORMAT` supports only the built-in `(delimiter=<delim>)` \<formatting-property\>. |
    +| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with `HdfsTextSimple` when \<path-to-hdfs-file\> will reference a comma-separated value file.  |
    +| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the `SequenceWritable` profile. The `SequenceWritable` '`CUSTOM`' `FORMAT` supports only the built-in `(formatter='pxfwritable_export)` (write) and `(formatter='pxfwritable_import)` (read) \<formatting-properties\>.
    +
    +**Note**: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification.
    +
    +## <a id="profile_hdfstextsimple"></a>Custom Options
    +
    +The `HdfsTextSimple` and `SequenceWritable` profiles support the following \<custom-options\>:
    +
    +| Keyword  | Value Description |
    +|-------|-------------------------------------|
    +| COMPRESSION_CODEC    | The compression codec Java class name. If this option is not provided, no data compression is performed. Supported compression codecs include: `org.apache.hadoop.io.compress.DefaultCodec`, `org.apache.hadoop.io.compress.BZip2Codec`, and `org.apache.hadoop.io.compress.GzipCodec` (`HdfsTextSimple` profile only) |
    +| COMPRESSION_TYPE    | The compression type to employ; supported values are `RECORD` (the default) or `BLOCK`. |
    +| DATA-SCHEMA    | (`SequenceWritable` profile only) The name of the writer serialization/deserialization class. The jar file in which this class resides must be in the PXF class path. This option has no default value. |
    --- End diff --
    
    Is DATA-SCHEMA an option, or is it required?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by dyozie <gi...@git.apache.org>.
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r86042974
  
    --- Diff: pxf/HDFSWritablePXF.html.md.erb ---
    @@ -0,0 +1,410 @@
    +---
    +title: Writing Data to HDFS
    +---
    +
    +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable table to export data from a HAWQ internal table to HDFS.
    +
    +This section describes how to use these PXF profiles to create writable external tables.
    +
    +**Note**: You cannot directly query data in a HAWQ writable table.  After creating the external writable table, you must create a HAWQ readable external table accessing the HDFS file, then query that table. ??You can also create a Hive table to access the HDFS file.??
    +
    +## <a id="pxfwrite_prereq"></a>Prerequisites
    +
    +Before working with HDFS file data using HAWQ and PXF, ensure that:
    +
    +-   The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
    +-   All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users.
    +
    +## <a id="hdfsplugin_writeextdata"></a>Writing to PXF External Tables
    +The PXF HDFS plug-in supports writable two profiles: `HdfsTextSimple` and `SequenceWritable`.
    +
    +Use the following syntax to create a HAWQ external writable table representing HDFS data:�
    +
    +``` sql
    +CREATE EXTERNAL WRITABLE TABLE <table_name> 
    +    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
    +LOCATION ('pxf://<host>[:<port>]/<path-to-hdfs-file>
    +    ?PROFILE=HdfsTextSimple|SequenceWritable[&<custom-option>=<value>[...]]')
    +FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
    +```
    +
    +HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the table below.
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| \<host\>[:\<port\>]    | The HDFS NameNode and port. |
    +| \<path-to-hdfs-file\>    | The path to the file in the HDFS data store. |
    +| PROFILE    | The `PROFILE` keyword must specify one of the values `HdfsTextSimple` or `SequenceWritable`. |
    +| \<custom-option\>  | \<custom-option\> is profile-specific. These options are discussed in the next topic.|
    --- End diff --
    
    Thanks - that sounds good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...

Posted by dyozie <gi...@git.apache.org>.
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r85814125
  
    --- Diff: pxf/HDFSWritablePXF.html.md.erb ---
    @@ -0,0 +1,410 @@
    +---
    +title: Writing Data to HDFS
    +---
    +
    +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and `SequenceWritable` profiles.  You might create a writable table to export data from a HAWQ internal table to HDFS.
    +
    +This section describes how to use these PXF profiles to create writable external tables.
    +
    +**Note**: You cannot directly query data in a HAWQ writable table.  After creating the external writable table, you must create a HAWQ readable external table accessing the HDFS file, then query that table. ??You can also create a Hive table to access the HDFS file.??
    +
    +## <a id="pxfwrite_prereq"></a>Prerequisites
    +
    +Before working with HDFS file data using HAWQ and PXF, ensure that:
    +
    +-   The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information.
    +-   All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users.
    +
    +## <a id="hdfsplugin_writeextdata"></a>Writing to PXF External Tables
    +The PXF HDFS plug-in supports writable two profiles: `HdfsTextSimple` and `SequenceWritable`.
    +
    +Use the following syntax to create a HAWQ external writable table representing HDFS data:�
    +
    +``` sql
    +CREATE EXTERNAL WRITABLE TABLE <table_name> 
    +    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
    +LOCATION ('pxf://<host>[:<port>]/<path-to-hdfs-file>
    +    ?PROFILE=HdfsTextSimple|SequenceWritable[&<custom-option>=<value>[...]]')
    +FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
    +```
    +
    +HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the table below.
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| \<host\>[:\<port\>]    | The HDFS NameNode and port. |
    +| \<path-to-hdfs-file\>    | The path to the file in the HDFS data store. |
    +| PROFILE    | The `PROFILE` keyword must specify one of the values `HdfsTextSimple` or `SequenceWritable`. |
    +| \<custom-option\>  | \<custom-option\> is profile-specific. These options are discussed in the next topic.|
    +| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile when \<path-to-hdfs-file\> will reference a plain text delimited file. The `HdfsTextSimple` '`TEXT`' `FORMAT` supports only the built-in `(delimiter=<delim>)` \<formatting-property\>. |
    +| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with `HdfsTextSimple` when \<path-to-hdfs-file\> will reference a comma-separated value file.  |
    +| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the `SequenceWritable` profile. The `SequenceWritable` '`CUSTOM`' `FORMAT` supports only the built-in `(formatter='pxfwritable_export)` (write) and `(formatter='pxfwritable_import)` (read) \<formatting-properties\>.
    +
    +**Note**: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification.
    +
    +## <a id="profile_hdfstextsimple"></a>Custom Options
    +
    +The `HdfsTextSimple` and `SequenceWritable` profiles support the following \<custom-options\>:
    +
    +| Keyword  | Value Description |
    +|-------|-------------------------------------|
    +| COMPRESSION_CODEC    | The compression codec Java class name. If this option is not provided, no data compression is performed. Supported compression codecs include: `org.apache.hadoop.io.compress.DefaultCodec`, `org.apache.hadoop.io.compress.BZip2Codec`, and `org.apache.hadoop.io.compress.GzipCodec` (`HdfsTextSimple` profile only) |
    +| COMPRESSION_TYPE    | The compression type to employ; supported values are `RECORD` (the default) or `BLOCK`. |
    +| DATA-SCHEMA    | (`SequenceWritable` profile only) The name of the writer serialization/deserialization class. The jar file in which this class resides must be in the PXF class path. This option has no default value. |
    +| THREAD-SAFE | Boolean value determining if a table query can run in multi-thread mode. Default value is `TRUE`, requests run in multi-threaded mode. When set to `FALSE`, requests will be handled in a single thread.  `THREAD-SAFE` should be set appropriately when operations that are not thread-safe are performed (i.e. compression). |
    +
    +## <a id="profile_hdfstextsimple"></a>HdfsTextSimple Profile
    +
    +Use the `HdfsTextSimple` profile when writing delimited data to a plain text file where each row is a single record.
    +
    +Writable tables created using the `HdfsTextSimple` profile can use no, record, or block compression. When compression is used, the default, gzip, and bzip2 Hadoop compression codecs are supported:
    +
    +- org.apache.hadoop.io.compress.DefaultCodec
    +- org.apache.hadoop.io.compress.GzipCodec
    +- org.apache.hadoop.io.compress.BZip2Codec
    +
    +\<formatting-properties\> supported by the `HdfsTextSimple` profile include:
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| delimiter    | The delimiter character to use when writing the file. Default value is a comma `,`.|
    +
    +
    +### <a id="profile_hdfstextsimple_writing"></a>Example: Writing Using the HdfsTextSimple Profile
    --- End diff --
    
    Writing *Data* Using the HdfsTextSimple Profile


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---