You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hcatalog-commits@incubator.apache.org by ga...@apache.org on 2012/08/22 03:33:52 UTC

svn commit: r1375887 - in /incubator/hcatalog/trunk: CHANGES.txt src/docs/src/documentation/content/xdocs/readerwriter.xml src/docs/src/documentation/content/xdocs/site.xml

Author: gates
Date: Wed Aug 22 03:33:52 2012
New Revision: 1375887

URL: http://svn.apache.org/viewvc?rev=1375887&view=rev
Log:
HCATALOG-444 Document reader & writer interfaces

Added:
    incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/readerwriter.xml
Modified:
    incubator/hcatalog/trunk/CHANGES.txt
    incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/site.xml

Modified: incubator/hcatalog/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/incubator/hcatalog/trunk/CHANGES.txt?rev=1375887&r1=1375886&r2=1375887&view=diff
==============================================================================
--- incubator/hcatalog/trunk/CHANGES.txt (original)
+++ incubator/hcatalog/trunk/CHANGES.txt Wed Aug 22 03:33:52 2012
@@ -38,6 +38,8 @@ Trunk (unreleased changes)
   HCAT-427 Document storage-based authorization (lefty via gates)
 
   IMPROVEMENTS
+  HCAT-444 Document reader & writer interfaces (lefty via gates)
+
   HCAT-425 Pig cannot read/write SMALLINT/TINYINT columns (traviscrawford)
 
   HCAT-460 Enable boolean to integer conversions (traviscrawford)

Added: incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/readerwriter.xml
URL: http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/readerwriter.xml?rev=1375887&view=auto
==============================================================================
--- incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/readerwriter.xml (added)
+++ incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/readerwriter.xml Wed Aug 22 03:33:52 2012
@@ -0,0 +1,159 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
+
+<document>
+  <header>
+    <title>Reader and Writer Interfaces</title>
+  </header>
+  <body>
+  
+<p>HCatalog provides a data transfer API for parallel input and output
+without using MapReduce. This API provides a way to read data from a
+Hadoop cluster or write data into a Hadoop cluster, using a basic storage
+abstraction of tables and rows.</p>
+<p>The data transfer API has three essential classes:</p>
+<ul>
+    <li>HCatReader &ndash; reads data from a Hadoop cluster</li>
+    <li>HCatWriter &ndash; writes data into a Hadoop cluster</li>
+    <li>DataTransferFactory &ndash; generates reader and writer instances</li>
+</ul>
+<p>Auxiliary classes in the data transfer API include:</p>
+<ul>
+    <li>ReadEntity</li>
+    <li>ReaderContext</li>
+    <li>WriteEntity</li>
+    <li>WriterContext</li>
+</ul>
+<p>The HCatalog data transfer API is designed to facilitate integration
+of external systems with Hadoop.</p>
+
+<!-- ==================================================================== -->
+<section>
+<title>HCatReader</title>
+
+<p>Reading is a two-step process in which the first step occurs on
+the master node of an external system. The second step is done in
+parallel on multiple slave nodes.</p>
+
+<p>Reads are done on a “ReadEntity”. Before you start to read, you need to define
+a ReadEntity from which to read. This can be done through ReadEntity.Builder. You
+can specify a database name, table name, partition, and filter string.
+For example:</p>
+
+<source>
+ReadEntity.Builder builder = new ReadEntity.Builder();
+ReadEntity entity = builder.withDatabase(“mydb”).withTable("mytbl").build();
+</source>
+
+<p>The code snippet above defines a ReadEntity object ("<code>entity</code>"),
+comprising a table named “mytbl” in a database named “mydb”, which can be used
+to read all the rows of this table.
+Note that this table must exist in HCatalog prior to the start of this
+operation.</p>
+
+<p>After defining a ReadEntity, you obtain an instance of HCatReader
+using the ReadEntity and cluster configuration:</p>
+
+<source>
+HCatReader reader = DataTransferFactory.getHCatReader(entity, config);
+</source>
+
+<p>The next step is to obtain a ReaderContext from <code>reader</code>
+as follows:</p>
+
+<source>
+ReaderContext cntxt = reader.prepareRead();
+</source>
+
+<p>All of the above steps occur on the master node. The master node then
+serializes this ReaderContext object and sends it to all the slave nodes.
+Slave nodes then use this reader context to read data.</p>
+
+<source>
+for(InputSplit split : readCntxt.getSplits()){
+HCatReader reader = DataTransferFactory.getHCatReader(split,
+readerCntxt.getConf());
+       Iterator&lt;HCatRecord&gt; itr = reader.read();
+       while(itr.hasNext()){
+              HCatRecord read = itr.next();
+          }
+}
+</source>
+
+</section>
+
+<!-- ==================================================================== -->
+<section>
+<title>HCatWriter</title>
+
+<p>Similar to reading, writing is also a two-step process in which the first
+step occurs on the master node. Subsequently, the second step occurs in
+parallel on slave nodes.</p>
+
+<p>Writes are done on a “WriteEntity” which can be constructed in a fashion
+similar to reads:</p>
+
+<source>
+WriteEntity.Builder builder = new WriteEntity.Builder();
+WriteEntity entity = builder.withDatabase(“mydb”).withTable("mytbl").build();
+</source>
+
+<p>The code above creates a WriteEntity object ("entity") which can be used
+to write into a table named “mytbl” in the database “mydb”.</p>
+
+<p>After creating a WriteEntity, the next step is to obtain a WriterContext:</p>
+
+<source>
+HCatWriter writer = DataTransferFactory.getHCatWriter(entity, config);
+WriterContext info = writer.prepareWrite();
+</source>
+
+<p>All of the above steps occur on the master node. The master node then
+serializes the WriterContext object and makes it available to all the
+slaves.</p>
+
+<p>On slave nodes, you need to obtain an HCatWriter using WriterContext
+as follows:</p>
+
+<source>
+HCatWriter writer = DataTransferFactory.getHCatWriter(context);
+</source>
+
+<p>Then, <code>writer</code> takes an iterator as the argument for
+the <code>write</code> method:</p>
+
+<source>
+writer.write(hCatRecordItr);
+</source>
+
+<p>The <code>writer</code> then calls getNext() on this iterator in a loop
+and writes out all the records attached to the iterator.</p>
+</section>
+
+<!-- ==================================================================== -->
+<section>
+    <title>Complete Example Program</title>
+<p>A complete java program for the reader and writer examples above can be found at: <a href="https://svn.apache.org/repos/asf/incubator/hcatalog/trunk/src/test/org/apache/hcatalog/data/TestReaderWriter.java">https://svn.apache.org/repos/asf/incubator/hcatalog/trunk/src/test/org/apache/hcatalog/data/TestReaderWriter.java</a>.</p>
+
+</section>
+
+
+    
+  </body>
+</document>

Modified: incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/site.xml
URL: http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/site.xml?rev=1375887&r1=1375886&r2=1375887&view=diff
==============================================================================
--- incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/site.xml (original)
+++ incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/site.xml Wed Aug 22 03:33:52 2012
@@ -43,6 +43,7 @@ See http://forrest.apache.org/docs/linki
     <index label="Installation From Tarball" href="install.html" />
     <index label="Load &amp; Store Interfaces" href="loadstore.html" />
     <index label="Input &amp; Output Interfaces" href="inputoutput.html" />
+    <index label="Reader &amp; Writer Interfaces" href="readerwriter.html" />
     <index label="Command Line Interface" href="cli.html" />
     <index label="Storage Formats" href="supportedformats.html" />
     <index label="Dynamic Partitioning" href="dynpartition.html" />