You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hcatalog-commits@incubator.apache.org by ga...@apache.org on 2012/08/22 03:33:52 UTC
svn commit: r1375887 - in /incubator/hcatalog/trunk: CHANGES.txt
src/docs/src/documentation/content/xdocs/readerwriter.xml
src/docs/src/documentation/content/xdocs/site.xml
Author: gates
Date: Wed Aug 22 03:33:52 2012
New Revision: 1375887
URL: http://svn.apache.org/viewvc?rev=1375887&view=rev
Log:
HCATALOG-444 Document reader & writer interfaces
Added:
incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/readerwriter.xml
Modified:
incubator/hcatalog/trunk/CHANGES.txt
incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/site.xml
Modified: incubator/hcatalog/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/incubator/hcatalog/trunk/CHANGES.txt?rev=1375887&r1=1375886&r2=1375887&view=diff
==============================================================================
--- incubator/hcatalog/trunk/CHANGES.txt (original)
+++ incubator/hcatalog/trunk/CHANGES.txt Wed Aug 22 03:33:52 2012
@@ -38,6 +38,8 @@ Trunk (unreleased changes)
HCAT-427 Document storage-based authorization (lefty via gates)
IMPROVEMENTS
+ HCAT-444 Document reader & writer interfaces (lefty via gates)
+
HCAT-425 Pig cannot read/write SMALLINT/TINYINT columns (traviscrawford)
HCAT-460 Enable boolean to integer conversions (traviscrawford)
Added: incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/readerwriter.xml
URL: http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/readerwriter.xml?rev=1375887&view=auto
==============================================================================
--- incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/readerwriter.xml (added)
+++ incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/readerwriter.xml Wed Aug 22 03:33:52 2012
@@ -0,0 +1,159 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
+
+<document>
+ <header>
+ <title>Reader and Writer Interfaces</title>
+ </header>
+ <body>
+
+<p>HCatalog provides a data transfer API for parallel input and output
+without using MapReduce. This API provides a way to read data from a
+Hadoop cluster or write data into a Hadoop cluster, using a basic storage
+abstraction of tables and rows.</p>
+<p>The data transfer API has three essential classes:</p>
+<ul>
+ <li>HCatReader – reads data from a Hadoop cluster</li>
+ <li>HCatWriter – writes data into a Hadoop cluster</li>
+ <li>DataTransferFactory – generates reader and writer instances</li>
+</ul>
+<p>Auxiliary classes in the data transfer API include:</p>
+<ul>
+ <li>ReadEntity</li>
+ <li>ReaderContext</li>
+ <li>WriteEntity</li>
+ <li>WriterContext</li>
+</ul>
+<p>The HCatalog data transfer API is designed to facilitate integration
+of external systems with Hadoop.</p>
+
+<!-- ==================================================================== -->
+<section>
+<title>HCatReader</title>
+
+<p>Reading is a two-step process in which the first step occurs on
+the master node of an external system. The second step is done in
+parallel on multiple slave nodes.</p>
+
+<p>Reads are done on a âReadEntityâ. Before you start to read, you need to define
+a ReadEntity from which to read. This can be done through ReadEntity.Builder. You
+can specify a database name, table name, partition, and filter string.
+For example:</p>
+
+<source>
+ReadEntity.Builder builder = new ReadEntity.Builder();
+ReadEntity entity = builder.withDatabase(âmydbâ).withTable("mytbl").build();
+</source>
+
+<p>The code snippet above defines a ReadEntity object ("<code>entity</code>"),
+comprising a table named âmytblâ in a database named âmydbâ, which can be used
+to read all the rows of this table.
+Note that this table must exist in HCatalog prior to the start of this
+operation.</p>
+
+<p>After defining a ReadEntity, you obtain an instance of HCatReader
+using the ReadEntity and cluster configuration:</p>
+
+<source>
+HCatReader reader = DataTransferFactory.getHCatReader(entity, config);
+</source>
+
+<p>The next step is to obtain a ReaderContext from <code>reader</code>
+as follows:</p>
+
+<source>
+ReaderContext cntxt = reader.prepareRead();
+</source>
+
+<p>All of the above steps occur on the master node. The master node then
+serializes this ReaderContext object and sends it to all the slave nodes.
+Slave nodes then use this reader context to read data.</p>
+
+<source>
+for(InputSplit split : readCntxt.getSplits()){
+HCatReader reader = DataTransferFactory.getHCatReader(split,
+readerCntxt.getConf());
+ Iterator<HCatRecord> itr = reader.read();
+ while(itr.hasNext()){
+ HCatRecord read = itr.next();
+ }
+}
+</source>
+
+</section>
+
+<!-- ==================================================================== -->
+<section>
+<title>HCatWriter</title>
+
+<p>Similar to reading, writing is also a two-step process in which the first
+step occurs on the master node. Subsequently, the second step occurs in
+parallel on slave nodes.</p>
+
+<p>Writes are done on a âWriteEntityâ which can be constructed in a fashion
+similar to reads:</p>
+
+<source>
+WriteEntity.Builder builder = new WriteEntity.Builder();
+WriteEntity entity = builder.withDatabase(âmydbâ).withTable("mytbl").build();
+</source>
+
+<p>The code above creates a WriteEntity object ("entity") which can be used
+to write into a table named âmytblâ in the database âmydbâ.</p>
+
+<p>After creating a WriteEntity, the next step is to obtain a WriterContext:</p>
+
+<source>
+HCatWriter writer = DataTransferFactory.getHCatWriter(entity, config);
+WriterContext info = writer.prepareWrite();
+</source>
+
+<p>All of the above steps occur on the master node. The master node then
+serializes the WriterContext object and makes it available to all the
+slaves.</p>
+
+<p>On slave nodes, you need to obtain an HCatWriter using WriterContext
+as follows:</p>
+
+<source>
+HCatWriter writer = DataTransferFactory.getHCatWriter(context);
+</source>
+
+<p>Then, <code>writer</code> takes an iterator as the argument for
+the <code>write</code> method:</p>
+
+<source>
+writer.write(hCatRecordItr);
+</source>
+
+<p>The <code>writer</code> then calls getNext() on this iterator in a loop
+and writes out all the records attached to the iterator.</p>
+</section>
+
+<!-- ==================================================================== -->
+<section>
+ <title>Complete Example Program</title>
+<p>A complete java program for the reader and writer examples above can be found at: <a href="https://svn.apache.org/repos/asf/incubator/hcatalog/trunk/src/test/org/apache/hcatalog/data/TestReaderWriter.java">https://svn.apache.org/repos/asf/incubator/hcatalog/trunk/src/test/org/apache/hcatalog/data/TestReaderWriter.java</a>.</p>
+
+</section>
+
+
+
+ </body>
+</document>
Modified: incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/site.xml
URL: http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/site.xml?rev=1375887&r1=1375886&r2=1375887&view=diff
==============================================================================
--- incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/site.xml (original)
+++ incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/site.xml Wed Aug 22 03:33:52 2012
@@ -43,6 +43,7 @@ See http://forrest.apache.org/docs/linki
<index label="Installation From Tarball" href="install.html" />
<index label="Load & Store Interfaces" href="loadstore.html" />
<index label="Input & Output Interfaces" href="inputoutput.html" />
+ <index label="Reader & Writer Interfaces" href="readerwriter.html" />
<index label="Command Line Interface" href="cli.html" />
<index label="Storage Formats" href="supportedformats.html" />
<index label="Dynamic Partitioning" href="dynpartition.html" />