You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hawq.apache.org by yo...@apache.org on 2016/10/26 18:31:02 UTC
[02/14] incubator-hawq-docs git commit: more content and rearranging
of pxf hdfs plugin page
more content and rearranging of pxf hdfs plugin page
Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/2da7a92a
Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/2da7a92a
Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/2da7a92a
Branch: refs/heads/develop
Commit: 2da7a92a3e8431335a48005d55a70c9eba333e16
Parents: 9ca2779
Author: Lisa Owen <lo...@pivotal.io>
Authored: Mon Oct 17 16:27:23 2016 -0700
Committer: Lisa Owen <lo...@pivotal.io>
Committed: Mon Oct 17 16:27:23 2016 -0700
----------------------------------------------------------------------
pxf/HDFSFileDataPXF.html.md.erb | 331 ++++++++---------------------------
1 file changed, 76 insertions(+), 255 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/2da7a92a/pxf/HDFSFileDataPXF.html.md.erb
----------------------------------------------------------------------
diff --git a/pxf/HDFSFileDataPXF.html.md.erb b/pxf/HDFSFileDataPXF.html.md.erb
index e1c621f..e49688e 100644
--- a/pxf/HDFSFileDataPXF.html.md.erb
+++ b/pxf/HDFSFileDataPXF.html.md.erb
@@ -17,27 +17,27 @@ Before working with HDFS file data using HAWQ and PXF, ensure that:
The PXF HDFS plug-in supports the following file formats:
-- TextFile - comma-separated value or delimited format plain text file
+- TextFile - comma-separated value (.csv) or delimited format plain text file
- SequenceFile - flat file consisting of binary key/value pairs
- Avro - JSON-defined, schema-based data serialization format
The PXF HDFS plug-in includes the following profiles to support the file formats listed above:
-- `HdfsTextSimple`
-- `HdfsTextMulti`
-- `SequenceWritable`
-- `Avro`
+- `HdfsTextSimple` - text files
+- `HdfsTextMulti` - text files with embedded line feeds
+- `SequenceWritable` - SequenceFile
+- `Avro` - Avro files
## <a id="hdfsplugin_datatypemap"></a>Data Type Mapping
jjj
## <a id="hdfsplugin_cmdline"></a>HDFS Shell Commands
-HAWQ includes command-line tools that interact directly with HDFS. These tools support typical file system operations including copying, listing, changing file permissions, etc.
+Hadoop includes command-line tools that interact directly with HDFS. These tools support typical file system operations including copying and listing files, changing file permissions, etc.
The HDFS file system command is `hdfs dfs <options> [<file>]`. Invoked with no options, `hdfs dfs` lists the file system options supported by the tool.
-`hdfs dfs` options used in this section are listed in the table below:
+`hdfs dfs` options used in this section are identified in the table below:
| Option | Description |
|-------|-------------------------------------|
@@ -57,7 +57,7 @@ Create a delimited plain text file:
$ vi /tmp/pxf_hdfs_ts.txt
```
-Add the following data to `pxf_hdfs_ts.txt`:
+Copy and paste the following data into `pxf_hdfs_ts.txt`:
``` pre
Prague,Jan,101,4875.33
@@ -66,7 +66,7 @@ Bangalore,May,317,8936.99
Beijing,Jul,411,11600.67
```
-Notice the use of the comma `,` to separate field values.
+Notice the use of the comma `,` to separate the four data fields.
Add the data file to HDFS:
@@ -86,7 +86,7 @@ Create a second delimited plain text file:
$ vi /tmp/pxf_hdfs_tm.txt
```
-Add the following data to `pxf_hdfs_tm.txt`:
+Copy/paste the following data into `pxf_hdfs_tm.txt`:
``` pre
"4627 Star Rd.
@@ -101,7 +101,7 @@ Chicago, IL 60605":Jul:2017
Columbus, OH 43213":Dec:2018
```
-Notice the use of the colon `:` to separate field values. Also notice the quotes around the first/address field. This field includes an embedded line feed.
+Notice the use of the colon `:` to separate the three fields. Also notice the quotes around the first (address) field. This field includes an embedded line feed.
Add the data file to HDFS:
@@ -112,7 +112,7 @@ $ sudo -u hdfs hdfs dfs -put /tmp/pxf_hdfs_tm.txt /data/pxf_examples/
You will use these HDFS files in later sections.
## <a id="hdfsplugin_queryextdata"></a>Querying External HDFS Data
-The PXF HDFS plug-in supports several HDFS-related profiles. These include `HdfsTextSimple`, `HdfsTextMulti`, `SequenceWritable`, and `Avro`.
+The PXF HDFS plug-in supports several profiles. These include `HdfsTextSimple`, `HdfsTextMulti`, `SequenceWritable`, and `Avro`.
Use the following syntax to create a HAWQ external table representing HDFS data:�
@@ -126,24 +126,29 @@ FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the table below.
-**Note**: Some profile-specific options and properties may be discussed in the relevant profile section later in this topic.
-
| Keyword | Value |
|-------|-------------------------------------|
-| host | The HDFS NameNode. |
-| \<path-to-hdfs-file\> | path to the file in the HDFS data store |
-| PROFILE | The `PROFILE` keyword must specify one of the values `HdfsTextSimple`, `HdfsTextMulti`, `SequenceWritable` or `Avro`. |
-| \<custom-option\> | \<custom-option\> is profile-specific. |
-| FORMAT 'TEXT' | Use `TEXT` `FORMAT` with the `HdfsTextSimple` profile when \<path-to-hdfs-file\> references a plain text delimited file. |
-| FORMAT 'CSV' | Use `CSV` `FORMAT` with `HdfsTextSimple` and `HdfsTextMulti` profiles when \<path-to-hdfs-file\> references a comma-separated value file. |
-| FORMAT 'CUSTOM' | Use the`CUSTOM` `FORMAT` with `Avro` and `SequenceWritable` profiles. The `CUSTOM` format supports only the built-in `formatter='pxfwritable_export'` \<formatting-property\>. |
-| \<formatting-properties\> | \<formatting-properties\> are profile-specific. |
+| \<host\>[:\<port\>] | The HDFS NameNode and port. |
+| \<path-to-hdfs-file\> | The path to the file in the HDFS data store. |
+| PROFILE | The `PROFILE` keyword must specify one of the values `HdfsTextSimple`, `HdfsTextMulti`, `SequenceWritable`, or `Avro`. |
+| \<custom-option\> | \<custom-option\> is profile-specific. Profile-specific options are discussed in the relevant profile topic later in this section.|
+| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile when \<path-to-hdfs-file\> references a plain text delimited file. |
+| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with `HdfsTextSimple` and `HdfsTextMulti` profiles when \<path-to-hdfs-file\> references a comma-separated value file. |
+| FORMAT 'CUSTOM' | Use the`CUSTOM` `FORMAT` with `Avro` and `SequenceWritable` profiles. The '`CUSTOM`' `FORMAT` supports only the built-in `(formatter='pxfwritable_export')` \<formatting-property\> |
+ \<formatting-properties\> | \<formatting-properties\> are profile-specific. Profile-specific formatting options are discussed in the relevant profile topic later in this section. |
+
+*Note*: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification.
-Note: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification.
+## <a id="profile_hdfstextsimple"></a>HdfsTextSimple Profile
-### <a id="profile_hdfstextsimple"></a>HdfsTextSimple Profile
+Use the `HdfsTextSimple` profile when reading plain text delimited or .csv files where each row is a single record.
+
+\<formatting-properties\> supported by the `HdfsTextSimple` profile include:
+
+| Keyword | Value |
+|-------|-------------------------------------|
+| delimiter | The delimiter character in the file. Default value is a comma `,`.|
-Use the `HdfsTextSimple` profile when reading plain text delimited or csv files where each row is a single record.
The following SQL call uses the PXF `HdfsTextSimple` profile to create a queryable HAWQ external table from the `pxf_hdfs_ts.txt` file you created and added to HDFS in an earlier section:
@@ -173,12 +178,18 @@ gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple_csv(location text, month tex
gpadmin=# SELECT * FROM pxf_hdfs_textsimple_csv;
```
-When specifying `FORMAT 'CSV'` for a comma-separated value file, no `delimiter` formatter option is required, as comma is the default delimiter.
+Notice: When specifying `FORMAT 'CSV'` for a comma-separated value file, no `delimiter` formatter option is required, as comma is the default.
-### <a id="profile_hdfstextmulti"></a>HdfsTextMulti Profile
+## <a id="profile_hdfstextmulti"></a>HdfsTextMulti Profile
Use the `HdfsTextMulti` profile when reading plain text files with delimited single- or multi- line records that include embedded (quoted) linefeed characters.
+\<formatting-properties\> supported by the `HdfsTextMulti` profile include:
+
+| Keyword | Value |
+|-------|-------------------------------------|
+| delimiter | The delimiter character in the file. |
+
The following SQL call uses the PXF `HdfsTextMulti` profile to create a queryable HAWQ external table from the `pxf_hdfs_tm.txt` file you created and added to HDFS in an earlier section:
``` sql
@@ -204,25 +215,28 @@ gpadmin=# SELECT * FROM pxf_hdfs_textmulti;
(5 rows)
```
-### <a id="profile_hdfsseqwritable"></a>SequenceWritable Profile
+## <a id="profile_hdfsseqwritable"></a>SequenceWritable Profile
Use the `SequenceWritable` profile when reading SequenceFile format files. Files of this type consist of binary key/value pairs. Sequence files are a common data transfer format between MapReduce jobs.
-The `SequenceWritable` profile supports the following \<custom-options\> or \<formatting-properties\>:
+The `SequenceWritable` profile supports the following \<custom-options\>:
| Keyword | Value Description |
|-------|-------------------------------------|
-| COMPRESSION_CODEC | Specifies the compression codec Java class name |
+| COMPRESSION_CODEC | The compression codec Java class name.|
| COMPRESSION_TYPE | The compression type of the sequence file; supported values are `RECORD` (the default) or `BLOCK`. |
-| \<path-to-hdfs-file\> | path to the file in the HDFS data store |
+| DATA-SCHEMA | The name of the writer serialization class. The jar file in which this class resides must be in the PXF class path. This option has no default value. |
+| THREAD-SAFE | Boolean value determining if a table query can run in multi-thread mode. Default value is `TRUE` - requests can run in multi-thread mode. When set to `FALSE`, requests will be handled in a single thread. |
-MORE HERE
+???? MORE HERE
-### <a id="profile_hdfsavro"></a>Avro Profile
+??? ADDRESS SERIALIZATION
+
+## <a id="profile_hdfsavro"></a>Avro Profile
Avro files store metadata with the data. Avro files also allow specification of an independent schema used when reading the file.
-#### <a id="profile_hdfsavrodatamap"></a>Data Type Mapping
+### <a id="profile_hdfsavrodatamap"></a>Data Type Mapping
To represent Avro data in HAWQ, map data values that use a primitive data type to HAWQ columns of the same type.
@@ -230,8 +244,6 @@ Avro supports complex data types including arrays, maps, records, enumerations,
The following table summarizes external mapping rules for Avro data.
-<caption><span class="tablecap">Table 2. Avro Data Type Mapping</span></caption>
-
<a id="topic_oy3_qwm_ss__table_j4s_h1n_ss"></a>
| Avro Data Type | PXF Type |
@@ -241,25 +253,23 @@ The following table summarizes external mapping rules for Avro data.
| Complex type: Fixed | BYTEA |
| Union | Follows the above conventions for primitive or complex data types, depending on the union; supports Null values. |
-#### <a id="profile_hdfsavroptipns"></a>Avro-Specific Formatting Options
+### <a id="profile_hdfsavroptipns"></a>Avro-Specific Custom Options
For complex types, the PXF Avro profile inserts default delimiters between collection items and values. You can use non-default delimiter characters by identifying values for specific Avro custom options in the `CREATE EXTERNAL TABLE` call.
-The Avro profile supports the following custom options:
-
-<caption><span class="tablecap">Table N. Avro Formatting Options</span></caption>
+The Avro profile supports the following \<custom-options\>:
| Option Name | Description
|---------------|--------------------|
| COLLECTION_DELIM | The delimiter character(s) to place between entries in a top-level array, map, or record field when PXF maps a Avro complex data type to a text column. The default is a comma `,` character. |
| MAPKEY_DELIM | The delimiter character(s) to place between the key and value of a map entry when PXF maps an Avro complex data type to a text column. The default is a colon `:` character. |
-| RECORDKEY_DELIM | The delimiter character(s) to place between the field name and value of a record entry when PXF maps an Avro complex data type to a text colum. The default is a colon `:` character. |
-| SCHEMA-DATA | The data schema file used to create and read�the HDFS file. For an Avro file, you may create an avsc. This option has no default value. |
-| THREAD-SAFE | Determines if the table query can run in multi-thread mode or not. Allowed values are `TRUE`, `FALSE`. Default value is `TRUE` - requests can run in multithread mode. When set to FALSE, requests will be handled in a single thread. |
+| RECORDKEY_DELIM | The delimiter character(s) to place between the field name and value of a record entry when PXF maps an Avro complex data type to a text column. The default is a colon `:` character. |
+| SCHEMA-DATA | The data schema file used to create and read�the HDFS file. This option has no default value. |
+| THREAD-SAFE | Boolean value determining if a table query can run in multi-thread mode. Default value is `TRUE` - requests can run in multi-thread mode. When set to `FALSE`, requests will be handled in a single thread. |
-#### <a id="topic_tr3_dpg_ts__section_m2p_ztg_ts"></a>Avro Schemas
+### <a id="topic_tr3_dpg_ts__section_m2p_ztg_ts"></a>Avro Schemas
-Avro schemas are defined using JSON, and composed of the same primitive and complex types identified in the data mapping section above. Avro schema file typically have a `.avsc` suffix.
+Avro schemas are defined using JSON, and composed of the same primitive and complex types identified in the data mapping section above. Avro schema files typically have a `.avsc` suffix.
Fields in an Avro schema file are defined via an array of objects, each of which is specified by a name and a type. The field type is another schema object
@@ -324,7 +334,7 @@ Copy and paste the following text into `avro_schema.avsc`:
An Avro schema, together with its data, is fully self-describing.
-#### <a id="topic_tr3_dpg_ts__section_spk_15g_ts"></a>Sample Avro Data (JSON)
+### <a id="topic_tr3_dpg_ts__section_spk_15g_ts"></a>Sample Avro Data (JSON)
Create a text file named `pxf_hdfs_avro.txt`:
@@ -342,7 +352,7 @@ Enter the following data into `pxf_hdfs_avro.txt`:
The sample data uses a comma `,` to separate top level records and a colon `:` to separate map/key values and record field name/values.
-Convert the text file to Avro format. There are various ways to perform the conversion programmatically and via the command line. In this example, we are using the [Java Avro tools](http://avro.apache.org/releases.html) and the jar file resides in the current directory:
+Convert the text file to Avro format. There are various ways to perform the conversion programmatically and via the command line. In this example, we use the [Java Avro tools](http://avro.apache.org/releases.html); the jar file resides in the current directory:
``` shell
$ java -jar ./avro-tools-1.8.1.jar fromjson --schema-file /tmp/avro_schema.avsc /tmp/pxf_hdfs_avro.txt > /tmp/pxf_hdfs_avro.avro
@@ -397,173 +407,14 @@ gpadmin=# SELECT username, address FROM followers_view WHERE followers @> '{john
jim | {number:9,street:deer creek,city:palo alto}
```
-~~XXXXX
-
-
-### <a id="resolver"></a>Resolver
-
-Choose the Resolver format if data records are serialized in the HDFS file.�
-
-**Note:** You must include a Profile or a Resolver in the table definition.
-
-<table>
-<colgroup>
-<col width="33%" />
-<col width="33%" />
-<col width="33%" />
-</colgroup>
-<thead>
-<tr class="header">
-<th>Record Serialization</th>
-<th>Resolver</th>
-<th>Comments</th>
-</tr>
-</thead>
-<tbody>
-<tr class="odd">
-<td>Avro</td>
-<td>org.apache.hawq.pxf.plugins.\u2009hdfs.AvroResolver</td>
-<td><ul>
-<li>Avro files include the record schema, Avro serialization can be used in other file types (e.g, Sequence File).�</li>
-<li>For Avro serialized records outside of an Avro file, include a schema file name (.avsc) in the url under the optional�<code class="ph codeph">Schema-Data�</code>option.</li>
-<li>Deserialize Only (Read)�.</li>
-</ul></td>
-</tr>
-<tr class="even">
-<td>Java Writable</td>
-<td>org.apache.hawq.pxf.plugins.\u2009hdfs.WritableResolver</td>
-<td><ul>
-<li>Include the name of the Java class that uses Writable serialization�in the URL under the optional�<code class="ph codeph">Schema-Data.</code></li>
-<li>The class file must exist in the public stage directory (or in Hadoop's class path).</li>
-<li>Deserialize and Serialize (Read + Write).�</li>
-<li>See <a href="#customizedwritableschemafileguidelines">Customized Writable Schema File Guidelines</a>.</li>
-</ul></td>
-</tr>
-<tr class="odd">
-<td>None (plain text)</td>
-<td>org.apache.hawq.pxf.plugins.\u2009hdfs.StringPassResolver</td>
-<td><ul>
-<li>Does not serialize plain text records. The database parses�plain records. Passes records as they are.</li>
-<li>Deserialize and Serialize (Read + Write).</li>
-</ul></td>
-</tr>
-</tbody>
-</table>
-
-#### <a id="customizedwritableschemafileguidelines"></a>Schema File Guidelines for WritableResolver
-
-When using a WritableResolver, a schema file needs to be defined. The file needs to be a Java class file and must be on the class path of PXF.
-
-The class file must follow the following requirements:
-
-1. Must implement org.apache.hadoop.io.Writable interface.
-2. WritableResolver uses reflection to recreate the schema and populate its fields (for both read and write). Then it uses the Writable interface functions to read/write. Therefore, fields must be public, to enable access to them. Private fields will be ignored.
-3. Fields are accessed and populated in the order in which they are declared in the class file.
-4. Supported field types:
- - boolean
- - byte array
- - double
- - float
- - int
- - long
- - short
- - string
-
- Arrays of any of the above types are supported, but the constructor must define the array size so the reflection will work.
-
-### <a id="additionaloptions"></a>Additional Options
-
-<a id="additionaloptions__table_skq_kpz_4p"></a>
-
-<table>
-<caption><span class="tablecap">Table 1. Additional PXF Options</span></caption>
-<colgroup>
-<col width="50%" />
-<col width="50%" />
-</colgroup>
-<thead>
-<tr class="header">
-<th>Option Name</th>
-<th>Description</th>
-</tr>
-</thead>
-<tbody>
-<tr class="odd">
-<td>COLLECTION_DELIM</td>
-<td>(Avro or Hive profiles only.) The delimiter character(s) to place between entries in a top-level array, map, or record field when PXF maps a Hive or Avro complex data type to a text column. The default is a "," character.</td>
-</tr>
-<tr class="even">
-<td>COMPRESSION_CODEC</td>
-<td><ul>
-<li>Useful for WRITABLE PXF tables.</li>
-<li>Specifies the compression codec class name for compressing the written data. The class must implement the�org.apache.hadoop.io.compress.CompressionCodec interface.</li>
-<li>�Some valid values are�org.apache.hadoop.io.compress.DefaultCodec org.apache.hadoop.io.compress.GzipCodec org.apache.hadoop.io.compress.BZip2Codec.</li>
-<li>Note: org.apache.hadoop.io.compress.BZip2Codec runs in a single thread and can be slow.</li>
-<li>This option has no default value.�</li>
-<li>When the option is not defined, no compression will be done.</li>
-</ul></td>
-</tr>
-<tr class="odd">
-<td>COMPRESSION_TYPE</td>
-<td><ul>
-<li>Useful WRITABLE PXF tables with SequenceFileAccessor.</li>
-<li>Ignored when COMPRESSION_CODEC is not defined.</li>
-<li>Specifies the compression type for sequence file.</li>
-<li>Valid options are:�
-<ul>
-<li>RECORD - only the value part of each row is compressed.</li>
-<li>BLOCK - both keys and values are collected in 'blocks' separately and compressed.</li>
-</ul></li>
-<li>Default value: RECORD.</li>
-</ul></td>
-</tr>
-<tr class="even">
-<td>MAPKEY_DELIM</td>
-<td>(Avro or Hive profiles only.) The delimiter character(s) to place between the key and value of a map entry when PXF maps a Hive or Avro complex data type to a text colum. The default is a ":" character.</td>
-</tr>
-<tr class="odd">
-<td>RECORDKEY_DELIM</td>
-<td>(Avro profile only.) The delimiter character(s) to place between the field name and value of a record entry when PXF maps an Avro complex data type to a text colum. The default is a ":" character.</td>
-</tr>
-<tr class="even">
-<td>SCHEMA-DATA</td>
-<td>The data schema file used to create and read�the HDFS file. For example, you could create an avsc (for Avro), or a Java class (for Writable Serialization) file. Make sure that you have added any JAR files containing the schema to <code class="ph codeph">pxf-public.classpath</code>.
-<p>This option has no default value.</p></td>
-</tr>
-<tr class="odd">
-<td>THREAD-SAFE</td>
-<td>Determines if the table query can run in multithread mode or not. When set to FALSE, requests will be handled in a single thread.
-<p>Should be set when a plug-in or other elements that are not thread safe are used (e.g. compression codec).</p>
-<p>Allowed values: TRUE, FALSE. Default value is TRUE - requests can run in multithread mode.</p></td>
-</tr>
-<tr class="even">
-<td>�<custom></td>
-<td>Any option added to the pxf URI string will be accepted and passed, along with its value, to the Fragmenter, Accessor, and Resolver implementations.</td>
-</tr>
-</tbody>
-</table>
-
-## <a id="accessingdataonahighavailabilityhdfscluster"></a>Accessing HDFS Data in a High Availability HDFS Cluster
-
-To�access data in a High Availability HDFS cluster, change the \<host\>�provided in the URI LOCATION clause. Use *HA\_nameservice* rather than *name\_node\_host:51200*.
-``` sql
-CREATE EXTERNAL TABLE <table_name>
- ( <column_name> <data_type> [, ...] | LIKE <other_table> )
-LOCATION ('pxf://<HA-nameservice>/<path-to-hdfs-file>
- ?PROFILE=HdfsTextSimple|HdfsTextMulti|Avro|SequenceWritable[&<custom-option>=<value>[...]]')
-FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
-```
+## <a id="recordkeyinkey-valuefileformats"></a>Reading the Record Key
-The opposite is true when a highly available HDFS cluster is reverted to a single namenode configuration. In that case, any table definition that has the \<HA-nameservice\> specified should use the \<host\>[:\<port\>] syntax.�
+Sequence file and other file formats that store rows in a key-value format can access the key value through HAWQ by using the `recordkey` keyword as a field name.
-## <a id="recordkeyinkey-valuefileformats"></a>Using a Record Key with Key-Value File Formats
+The field type of `recordkey` must correspond to the key type, much as the other fields must match the HDFS data.�
-For sequence file and other file formats that store rows in a key-value format, the key value can be accessed through HAWQ by using the saved keyword '`recordkey`' as a field name.
-
-The field type must correspond to the key type, much as the other fields must match the HDFS data.�
-
-WritableResolver supports read and write of recordkey, which can be of the following Writable Hadoop types:
+`recordkey` can be of the following Hadoop types:
- BooleanWritable
- ByteWritable
@@ -573,59 +424,29 @@ WritableResolver supports read and write of recordkey, which can be of the follo
- LongWritable
- Text
-If the `recordkey` field is not defined, the key is ignored in read, and a default value (segment id as LongWritable) is written in write.
-
### <a id="example1"></a>Example
-A data schema `Babies.class` contains three fields: (name text, birthday text, weight float).�An external table must include these three fields, and can�either include or�ignore the recordkey.
+A data schema `Babies.class` contains three fields: name (text), birthday (text), weight (float).�An external table definition for this schema must include these three fields, and can�either include or�ignore the `recordkey`.
``` sql
--- writable table with recordkey
-CREATE WRITABLE EXTERNAL TABLE babies_registry (recordkey int, name text, birthday text, weight float)
- LOCATION ('pxf://namenode_host:51200/babies_1940s'
- '?ACCESSOR=org.apache.hawq.pxf.plugins.hdfs.SequenceFileAccessor'
- '&RESOLVER=org.apache.hawq.pxf.plugins.hdfs.WritableResolver'
- '&DATA-SCHEMA=Babies')
- FORMAT 'CUSTOM' (formatter='pxfwritable_export');
-INSERT INTO babies_registry VALUES (123456, "James Paul McCartney", "June 18, 1942", 3.800);
-
--- writable table without recordkey
-CREATE WRITABLE EXTERNAL TABLE babies_registry2 (name text, birthday text, weight float)
- LOCATION ('pxf://namenode_host:51200/babies_1940s'
- '?ACCESSOR=org.apache.hawq.pxf.plugins.SequenceFileAccessor'
- '&RESOLVER=org.apache.hawq.pxf.plugins.WritableResolver'
- '&DATA-SCHEMA=Babies')
- FORMAT 'CUSTOM' (formatter='pxfwritable_export');
-
--- this record's key will have some default value
-INSERT INTO babies_registry VALUES ("Richard Starkey", "July 7, 1940", 4.0);
+gpadmin=# CREATE EXTERNAL TABLE babies_1940 (recordkey int, name text, birthday text, weight float)
+ LOCATION ('pxf://namenode:51200/babies_1940s?PROFILE=SequenceWritable&DATA-SCHEMA=Babies')
+ FORMAT 'CUSTOM' (formatter='pxfwritable_import');
+gpadmin=# SELECT * FROM babies_1940;
```
-The same goes for reading data from an existing file with a key-value format, e.g. a Sequence file.
+## <a id="accessdataonahavhdfscluster"></a>Accessing HDFS Data in a High Availability HDFS Cluster
+
+To�access external HDFS data in a High Availability HDFS cluster, change the URI LOCATION clause to use \<HA-nameservice\> rather than \<host\>[:\<port\>].
``` sql
--- readable table with recordkey
-CREATE EXTERNAL TABLE babies_1940 (recordkey int, name text, birthday text, weight float)
- LOCATION ('pxf://namenode_host:51200/babies_1940s'
- '?FRAGMENTER=org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter'
- '&ACCESSOR=org.apache.hawq.pxf.plugins.hdfs.SequenceFileAccessor'
- '&RESOLVER=org.apache.hawq.pxf.plugins.hdfs.WritableResolver'
- '&DATA-SCHEMA=Babies')
- FORMAT 'CUSTOM' (formatter='pxfwritable_import');
--- retrieve each record's key
-SELECT * FROM babies_1940;
-
--- readable table without recordkey
-CREATE EXTERNAL TABLE babies_1940_2 (name text, birthday text, weight float)
- LOCATION ('pxf://namenode_host:51200/babies_1940s'
- '?FRAGMENTER=org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter'
- '&ACCESSOR=org.apache.hawq.pxf.plugins.hdfs.SequenceFileAccessor'
- '&RESOLVER=org.apache.hawq.pxf.plugins.hdfs.WritableResolver'
- '&DATA-SCHEMA=Babies')
- FORMAT 'CUSTOM' (formatter='pxfwritable_import');
--- ignores the records' key
-SELECT * FROM babies_1940_2;
+gpadmin=# CREATE EXTERNAL TABLE <table_name> ( <column_name> <data_type> [, ...] | LIKE <other_table> )
+ LOCATION ('pxf://<HA-nameservice>/<path-to-hdfs-file>?PROFILE=HdfsTextSimple|HdfsTextMulti|Avro|SequenceWritable[&<custom-option>=<value>[...]]')
+ FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
```
+The opposite is true when a highly available HDFS cluster is reverted to a single NameNode configuration. In that case, any table definition that has the \<HA-nameservice\> specified should use the \<host\>[:\<port\>] syntax.�
+
+
## <a id="hdfs_advanced"></a>Advanced
-If you find that the pre-defined PXF HDFS profiles do not meet your needs, you may choose to create a custom HDFS profile from the existing HDFS Accessors and Resolvers. Refer to [XX]() for information on
\ No newline at end of file
+If you find that the pre-defined PXF HDFS profiles do not meet your needs, you may choose to create a custom HDFS profile from the existing HDFS Accessors and Resolvers. Refer to [Adding and Updating Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on creating a custom profile.
\ No newline at end of file