You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by bu...@apache.org on 2016/07/14 19:47:07 UTC
[44/52] [partial] hbase-site git commit: Published site at
a55af38689fbe273e716ebbf6191e9515986dbf3.
http://git-wip-us.apache.org/repos/asf/hbase-site/blob/975096b1/book.html
----------------------------------------------------------------------
diff --git a/book.html b/book.html
index 3e5793e..2dee923 100644
--- a/book.html
+++ b/book.html
@@ -17111,7 +17111,8 @@ of the <a href="#security">Securing Apache HBase</a> chapter.</p>
<p>The following examples use the placeholder server http://example.com:8000, and
the following commands can all be run using <code>curl</code> or <code>wget</code> commands. You can request
plain text (the default), XML , or JSON output by adding no header for plain text,
-or the header "Accept: text/xml" for XML or "Accept: application/json" for JSON.</p>
+or the header "Accept: text/xml" for XML, "Accept: application/json" for JSON, or
+"Accept: application/x-protobuf" to for protocol buffers.</p>
</div>
<div class="admonitionblock note">
<table>
@@ -17126,171 +17127,345 @@ creation or mutation, and <code>DELETE</code> for deletion.
</tr>
</table>
</div>
-<div class="sect3">
-<h4 id="_cluster_information"><a class="anchor" href="#_cluster_information"></a>76.3.1. Cluster Information</h4>
-<div class="listingblock">
-<div class="title">HBase Version</div>
-<div class="content">
-<pre>http://example.com:8000/version/cluster</pre>
-</div>
-</div>
-<div class="listingblock">
-<div class="title">Cluster Status</div>
-<div class="content">
-<pre>http://example.com:8000/status/cluster</pre>
-</div>
-</div>
-<div class="listingblock">
-<div class="title">Table List</div>
-<div class="content">
-<pre>http://example.com:8000/</pre>
-</div>
-</div>
-</div>
-<div class="sect3">
-<h4 id="_table_information"><a class="anchor" href="#_table_information"></a>76.3.2. Table Information</h4>
-<div class="paragraph">
-<div class="title">Table Schema (GET)</div>
-<p>To retrieve the table schema, use a <code>GET</code> request with the <code>/schema</code> endpoint:</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre>http://example.com:8000/<table>/schema</pre>
-</div>
-</div>
-<div class="paragraph">
-<div class="title">Table Creation</div>
-<p>To create a table, use a <code>PUT</code> request with the <code>/schema</code> endpoint:</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre>http://example.com:8000/<table>/schema</pre>
-</div>
-</div>
-<div class="paragraph">
-<div class="title">Table Schema Update</div>
-<p>To update a table, use a <code>POST</code> request with the <code>/schema</code> endpoint:</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre>http://example.com:8000/<table>/schema</pre>
-</div>
-</div>
-<div class="paragraph">
-<div class="title">Table Deletion</div>
-<p>To delete a table, use a <code>DELETE</code> request with the <code>/schema</code> endpoint:</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre>http://example.com:8000/<table>/schema</pre>
-</div>
-</div>
-<div class="listingblock">
-<div class="title">Table Regions</div>
-<div class="content">
-<pre>http://example.com:8000/<table>/regions</pre>
-</div>
-</div>
-</div>
-<div class="sect3">
-<h4 id="_gets"><a class="anchor" href="#_gets"></a>76.3.3. Gets</h4>
-<div class="paragraph">
-<div class="title">GET a Single Cell Value</div>
-<p>To get a single cell value, use a URL scheme like the following:</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre>http://example.com:8000/<table>/<row>/<column>:<qualifier>/<timestamp>/content:raw</pre>
-</div>
-</div>
-<div class="paragraph">
-<p>The column qualifier and timestamp are optional. Without them, the whole row will
-be returned, or the newest version will be returned.</p>
-</div>
-<div class="paragraph">
-<div class="title">Multiple Single Values (Multi-Get)</div>
-<p>To get multiple single values, specify multiple column:qualifier tuples and/or a start-timestamp
-and end-timestamp. You can also limit the number of versions.</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre>http://example.com:8000/<table>/<row>/<column>:<qualifier>?v=<num-versions></pre>
-</div>
-</div>
-<div class="paragraph">
-<div class="title">Globbing Rows</div>
-<p>To scan a series of rows, you can use a <code>*</code> glob
-character on the <row> value to glob together multiple rows.</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre>http://example.com:8000/urls/https|ad.doubleclick.net|*</pre>
-</div>
-</div>
-</div>
-<div class="sect3">
-<h4 id="_puts"><a class="anchor" href="#_puts"></a>76.3.4. Puts</h4>
-<div class="paragraph">
-<p>For Puts, <code>PUT</code> and <code>POST</code> are equivalent.</p>
-</div>
-<div class="paragraph">
-<div class="title">Put a Single Value</div>
-<p>The column qualifier and the timestamp are optional.</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre>http://example.com:8000/put/<table>/<row>/<column>:<qualifier>/<timestamp>
-http://example.com:8000/test/testrow/test:testcolumn</pre>
-</div>
-</div>
-<div class="paragraph">
-<div class="title">Put Multiple Values</div>
-<p>To put multiple values, use a false row key. Row, column, and timestamp values in
-the supplied cells override the specifications on the path, allowing you to post
-multiple values to a table in batch. The HTTP response code indicates the status of
-the put. Set the <code>Content-Type</code> to <code>text/xml</code> for XML encoding or to <code>application/x-protobuf</code>
-for protobufs encoding. Supply the commit data in the <code>PUT</code> or <code>POST</code> body, using
-the <a href="#xml_schema">REST XML Schema</a> and <a href="#protobufs_schema">REST Protobufs Schema</a> as guidelines.</p>
-</div>
-</div>
-<div class="sect3">
-<h4 id="_scans"><a class="anchor" href="#_scans"></a>76.3.5. Scans</h4>
-<div class="paragraph">
-<p><code>PUT</code> and <code>POST</code> are equivalent for scans.</p>
-</div>
-<div class="paragraph">
-<div class="title">Scanner Creation</div>
-<p>To create a scanner, use the <code>/scanner</code> endpoint. The HTTP response code indicates
-success (201) or failure (anything else), and on successful scanner creation, the
-URI is returned which should be used to address the scanner.</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre>http://example.com:8000/<table>/scanner</pre>
-</div>
-</div>
-<div class="paragraph">
-<div class="title">Scanner Get Next</div>
-<p>To get the next batch of cells found by the scanner, use the <code>/scanner/<scanner-id>'
-endpoint, using the URI returned by the scanner creation endpoint. If the scanner
-is exhausted, HTTP status `204</code> is returned.</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre>http://example.com:8000/<table>/scanner/<scanner-id></pre>
-</div>
-</div>
-<div class="paragraph">
-<div class="title">Scanner Deletion</div>
-<p>To delete resources associated with a scanner, send a HTTP <code>DELETE</code> request to the
-<code>/scanner/<scanner-id></code> endpoint.</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre>http://example.com:8000/<table>/scanner/<scanner-id></pre>
-</div>
-</div>
-</div>
+<table class="tableblock frame-all grid-all spread">
+<caption class="title">Table 11. Cluster-Wide Endpoints</caption>
+<colgroup>
+<col style="width: 16%;">
+<col style="width: 8%;">
+<col style="width: 25%;">
+<col style="width: 50%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top">Endpoint</th>
+<th class="tableblock halign-left valign-top">HTTP Verb</th>
+<th class="tableblock halign-left valign-top">Description</th>
+<th class="tableblock halign-left valign-top">Example</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>/version/cluster</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>GET</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Version of HBase running on this cluster</p></td>
+<td class="tableblock halign-left valign-top"><div class="literal"><pre>curl -vi -X GET \
+ -H "Accept: text/xml" \
+ "http://example.com:8000/version/cluster"</pre></div></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>/status/cluster</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>GET</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Cluster status</p></td>
+<td class="tableblock halign-left valign-top"><div class="literal"><pre>curl -vi -X GET \
+ -H "Accept: text/xml" \
+ "http://example.com:8000/status/cluster"</pre></div></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>/</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>GET</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">List of all non-system tables</p></td>
+<td class="tableblock halign-left valign-top"><div class="literal"><pre>curl -vi -X GET \
+ -H "Accept: text/xml" \
+ "http://example.com:8000/"</pre></div></td>
+</tr>
+</tbody>
+</table>
+<table class="tableblock frame-all grid-all spread">
+<caption class="title">Table 12. Namespace Endpoints</caption>
+<colgroup>
+<col style="width: 16%;">
+<col style="width: 8%;">
+<col style="width: 25%;">
+<col style="width: 50%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top">Endpoint</th>
+<th class="tableblock halign-left valign-top">HTTP Verb</th>
+<th class="tableblock halign-left valign-top">Description</th>
+<th class="tableblock halign-left valign-top">Example</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>/namespaces</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>GET</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">List all namespaces</p></td>
+<td class="tableblock halign-left valign-top"><div class="literal"><pre>curl -vi -X GET \
+ -H "Accept: text/xml" \
+ "http://example.com:8000/namespaces/"</pre></div></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>/namespaces/<em>namespace</em></code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>GET</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Describe a specific namespace</p></td>
+<td class="tableblock halign-left valign-top"><div class="literal"><pre>curl -vi -X GET \
+ -H "Accept: text/xml" \
+ "http://example.com:8000/namespaces/special_ns"</pre></div></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>/namespaces/<em>namespace</em></code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>POST</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Create a new namespace</p></td>
+<td class="tableblock halign-left valign-top"><div class="literal"><pre>curl -vi -X POST \
+ -H "Accept: text/xml" \
+ "example.com:8000/namespaces/special_ns"</pre></div></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>/namespaces/<em>namespace</em>/tables</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>GET</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">List all tables in a specific namespace</p></td>
+<td class="tableblock halign-left valign-top"><div class="literal"><pre>curl -vi -X GET \
+ -H "Accept: text/xml" \
+ "http://example.com:8000/namespaces/special_ns/tables"</pre></div></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>/namespaces/<em>namespace</em></code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>PUT</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Alter an existing namespace. Currently not used.</p></td>
+<td class="tableblock halign-left valign-top"><div class="literal"><pre>curl -vi -X PUT \
+ -H "Accept: text/xml" \
+ "http://example.com:8000/namespaces/special_ns</pre></div></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>/namespaces/<em>namespace</em></code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>DELETE</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Delete a namespace. The namespace must be empty.</p></td>
+<td class="tableblock halign-left valign-top"><div class="literal"><pre>curl -vi -X DELETE \
+ -H "Accept: text/xml" \
+ "example.com:8000/namespaces/special_ns"</pre></div></td>
+</tr>
+</tbody>
+</table>
+<table class="tableblock frame-all grid-all spread">
+<caption class="title">Table 13. Table Endpoints</caption>
+<colgroup>
+<col style="width: 16%;">
+<col style="width: 8%;">
+<col style="width: 25%;">
+<col style="width: 50%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top">Endpoint</th>
+<th class="tableblock halign-left valign-top">HTTP Verb</th>
+<th class="tableblock halign-left valign-top">Description</th>
+<th class="tableblock halign-left valign-top">Example</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>/<em>table</em>/schema</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>GET</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Describe the schema of the specified table.</p></td>
+<td class="tableblock halign-left valign-top"><div class="literal"><pre>curl -vi -X GET \
+ -H "Accept: text/xml" \
+ "http://example.com:8000/users/schema"</pre></div></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>/<em>table</em>/schema</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>POST</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Create a new table, or replace an existing table’s schema</p></td>
+<td class="tableblock halign-left valign-top"><div class="literal"><pre>curl -vi -X POST \
+ -H "Accept: text/xml" \
+ -H "Content-Type: text/xml" \
+ -d '<?xml version="1.0" encoding="UTF-8"?><TableSchema name="users"><ColumnSchema name="cf" /></TableSchema>' \
+ "http://example.com:8000/users/schema"</pre></div></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>/<em>table</em>/schema</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>PUT</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Update an existing table with the provided schema fragment</p></td>
+<td class="tableblock halign-left valign-top"><div class="literal"><pre>curl -vi -X PUT \
+ -H "Accept: text/xml" \
+ -H "Content-Type: text/xml" \
+ -d '<?xml version="1.0" encoding="UTF-8"?><TableSchema name="users"><ColumnSchema name="cf" KEEP_DELETED_CELLS="true" /></TableSchema>' \
+ "http://example.com:8000/users/schema"</pre></div></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>/<em>table</em>/schema</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>DELETE</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Delete the table. You must use the <code>/<em>table</em>/schema</code> endpoint, not just <code>/<em>table</em>/</code>.</p></td>
+<td class="tableblock halign-left valign-top"><div class="literal"><pre>curl -vi -X DELETE \
+ -H "Accept: text/xml" \
+ "http://example.com:8000/users/schema"</pre></div></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>/<em>table</em>/regions</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>GET</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">List the table regions</p></td>
+<td class="tableblock halign-left valign-top"><div class="literal"><pre>curl -vi -X GET \
+ -H "Accept: text/xml" \
+ "http://example.com:8000/users/regions</pre></div></td>
+</tr>
+</tbody>
+</table>
+<table class="tableblock frame-all grid-all spread">
+<caption class="title">Table 14. Endpoints for <code>Get</code> Operations</caption>
+<colgroup>
+<col style="width: 16%;">
+<col style="width: 8%;">
+<col style="width: 25%;">
+<col style="width: 50%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top">Endpoint</th>
+<th class="tableblock halign-left valign-top">HTTP Verb</th>
+<th class="tableblock halign-left valign-top">Description</th>
+<th class="tableblock halign-left valign-top">Example</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>/<em>table</em>/<em>row</em>/<em>column:qualifier</em>/<em>timestamp</em></code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>GET</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Get the value of a single row. Values are Base-64 encoded.</p></td>
+<td class="tableblock halign-left valign-top"><div class="literal"><pre>curl -vi -X GET \
+ -H "Accept: text/xml" \
+ "http://example.com:8000/users/row1"
+
+curl -vi -X GET \
+ -H "Accept: text/xml" \
+ "http://example.com:8000/users/row1/cf:a/1458586888395"</pre></div></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>/<em>table</em>/<em>row</em>/<em>column:qualifier</em></code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>GET</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Get the value of a single column. Values are Base-64 encoded.</p></td>
+<td class="tableblock halign-left valign-top"><div class="literal"><pre>curl -vi -X GET \
+ -H "Accept: text/xml" \
+ "http://example.com:8000/users/row1/cf:a"
+
+curl -vi -X GET \
+ -H "Accept: text/xml" \
+ "http://example.com:8000/users/row1/cf:a/"</pre></div></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>/<em>table</em>/<em>row</em>/<em>column:qualifier</em>/?v=<em>number_of_versions</em></code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>GET</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Multi-Get a specified number of versions of a given cell. Values are Base-64 encoded.</p></td>
+<td class="tableblock halign-left valign-top"><div class="literal"><pre>curl -vi -X GET \
+ -H "Accept: text/xml" \
+ "http://example.com:8000/users/row1/cf:a?v=2"</pre></div></td>
+</tr>
+</tbody>
+</table>
+<table class="tableblock frame-all grid-all spread">
+<caption class="title">Table 15. Endpoints for <code>Scan</code> Operations</caption>
+<colgroup>
+<col style="width: 16%;">
+<col style="width: 8%;">
+<col style="width: 25%;">
+<col style="width: 50%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top">Endpoint</th>
+<th class="tableblock halign-left valign-top">HTTP Verb</th>
+<th class="tableblock halign-left valign-top">Description</th>
+<th class="tableblock halign-left valign-top">Example</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>/<em>table</em>/scanner/</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>PUT</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Get a Scanner object. Required by all other Scan operations. Adjust the batch parameter
+to the number of rows the scan should return in a batch. See the next example for
+adding filters to your scanner. The scanner endpoint URL is returned as the <code>Location</code>
+in the HTTP response. The other examples in this table assume that the scanner endpoint
+is <code>http://example.com:8000/users/scanner/145869072824375522207</code>.</p></td>
+<td class="tableblock halign-left valign-top"><div class="literal"><pre>curl -vi -X PUT \
+ -H "Accept: text/xml" \
+ -H "Content-Type: text/xml" \
+ -d '<Scanner batch="1"/>' \
+ "http://example.com:8000/users/scanner/"</pre></div></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>/<em>table</em>/scanner/</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>PUT</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">To supply filters to the Scanner object or configure the
+Scanner in any other way, you can create a text file and add
+your filter to the file. For example, to return only rows for
+which keys start with <codeph>u123</codeph> and use a batch size
+of 100, the filter file would look like this:
+</p><p class="tableblock"></p><p class="tableblock"><pre>
+<Scanner batch="100">
+ <filter>
+ {
+ "type": "PrefixFilter",
+ "value": "u123"
+ }
+ </filter>
+</Scanner>
+</pre>
+</p><p class="tableblock"></p><p class="tableblock">Pass the file to the <code>-d</code> argument of the <code>curl</code> request.</p></td>
+<td class="tableblock halign-left valign-top"><div class="literal"><pre>curl -vi -X PUT \
+ -H "Accept: text/xml" \
+ -H "Content-Type:text/xml" \
+ -d @filter.txt \
+ "http://example.com:8000/users/scanner/"</pre></div></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>/<em>table</em>/scanner/<em>scanner-id</em></code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>GET</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Get the next batch from the scanner. Cell values are byte-encoded. If the scanner
+has been exhausted, HTTP status <code>204</code> is returned.</p></td>
+<td class="tableblock halign-left valign-top"><div class="literal"><pre>curl -vi -X GET \
+ -H "Accept: text/xml" \
+ "http://example.com:8000/users/scanner/145869072824375522207"</pre></div></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code><em>table</em>/scanner/<em>scanner-id</em></code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>DELETE</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Deletes the scanner and frees the resources it used.</p></td>
+<td class="tableblock halign-left valign-top"><div class="literal"><pre>curl -vi -X DELETE \
+ -H "Accept: text/xml" \
+ "http://example.com:8000/users/scanner/145869072824375522207"</pre></div></td>
+</tr>
+</tbody>
+</table>
+<table class="tableblock frame-all grid-all spread">
+<caption class="title">Table 16. Endpoints for <code>Put</code> Operations</caption>
+<colgroup>
+<col style="width: 16%;">
+<col style="width: 8%;">
+<col style="width: 25%;">
+<col style="width: 50%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top">Endpoint</th>
+<th class="tableblock halign-left valign-top">HTTP Verb</th>
+<th class="tableblock halign-left valign-top">Description</th>
+<th class="tableblock halign-left valign-top">Example</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>/<em>table</em>/<em>row_key</em></code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>PUT</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Write a row to a table. The row, column qualifier, and value must each be Base-64
+encoded. To encode a string, use the <code>base64</code> command-line utility. To decode the
+string, use <code>base64 -d</code>. The payload is in the <code>--data</code> argument, and the <code>/users/fakerow</code>
+value is a placeholder. Insert multiple rows by adding them to the <code><CellSet></code>
+element. You can also save the data to be inserted to a file and pass it to the <code>-d</code>
+parameter with syntax like <code>-d @filename.txt</code>.</p></td>
+<td class="tableblock halign-left valign-top"><div class="literal"><pre>curl -vi -X PUT \
+ -H "Accept: text/xml" \
+ -H "Content-Type: text/xml" \
+ -d '<?xml version="1.0" encoding="UTF-8" standalone="yes"?><CellSet><Row key="cm93NQo="><Cell column="Y2Y6ZQo=">dmFsdWU1Cg==</Cell></Row></CellSet>' \
+ "http://example.com:8000/users/fakerow"
+
+curl -vi -X PUT \
+ -H "Accept: text/json" \
+ -H "Content-Type: text/json" \
+ -d '{"Row":[{"key":"cm93NQo=", "Cell": [{"column":"Y2Y6ZQo=", "$":"dmFsdWU1Cg=="}]}]}'' \
+ "example.com:8000/users/fakerow"</pre></div></td>
+</tr>
+</tbody>
+</table>
</div>
<div class="sect2">
<h3 id="xml_schema"><a class="anchor" href="#xml_schema"></a>76.4. REST XML Schema</h3>
@@ -17427,7 +17602,7 @@ is exhausted, HTTP status `204</code> is returned.</p>
<span class="tag"><complexType</span> <span class="attribute-name">name</span>=<span class="string"><span class="delimiter">"</span><span class="content">Node</span><span class="delimiter">"</span></span><span class="tag">></span>
<span class="tag"><sequence></span>
<span class="tag"><element</span> <span class="attribute-name">name</span>=<span class="string"><span class="delimiter">"</span><span class="content">region</span><span class="delimiter">"</span></span> <span class="attribute-name">type</span>=<span class="string"><span class="delimiter">"</span><span class="content">tns:Region</span><span class="delimiter">"</span></span>
- <span class="attribute-name">maxOccurs</span>=<span class="string"><span class="delimiter">"</span><span class="content">unbounded</span><span class="delimiter">"</span></span> <span class="attribute-name">minOccurs</span>=<span class="string"><span class="delimiter">"</span><span class="content">0</span><span class="delimiter">"</span></span><span class="tag">></span>
+ <span class="attribute-name">maxOccurs</span>=<span class="string"><span class="delimiter">"</span><span class="content">unbounded</span><span class="delimiter">"</span></span> <span class="attribute-name">minOccurs</span>=<span class="string"><span class="delimiter">"</span><span class="content">0</span><span class="delimiter">"</span></span><span class="tag">></span>
<span class="tag"></element></span>
<span class="tag"></sequence></span>
<span class="tag"><attribute</span> <span class="attribute-name">name</span>=<span class="string"><span class="delimiter">"</span><span class="content">name</span><span class="delimiter">"</span></span> <span class="attribute-name">type</span>=<span class="string"><span class="delimiter">"</span><span class="content">string</span><span class="delimiter">"</span></span><span class="tag">></span><span class="tag"></attribute></span>
@@ -17654,7 +17829,7 @@ a row, get a column value, perform a query, and do some additional HBase operati
<span class="comment">//*drop if table is already exist.*</span>
<span class="keyword">if</span>(dbo.isTableExist(<span class="string"><span class="delimiter">"</span><span class="content">user</span><span class="delimiter">"</span></span>)){
- dbo.deleteTable(<span class="string"><span class="delimiter">"</span><span class="content">user</span><span class="delimiter">"</span></span>);
+ dbo.deleteTable(<span class="string"><span class="delimiter">"</span><span class="content">user</span><span class="delimiter">"</span></span>);
}
<span class="comment">//*create table*</span>
@@ -18841,25 +19016,165 @@ values for this row for all column families.</p>
<h2 id="_sparksql_dataframes"><a class="anchor" href="#_sparksql_dataframes"></a>86. SparkSQL/DataFrames</h2>
<div class="sectionbody">
<div class="paragraph">
-<p><a href="http://spark.apache.org/sql/">SparkSQL</a> is a subproject of Spark that supports
-SQL that will compute down to a Spark DAG. In addition,SparkSQL is a heavy user
-of DataFrames. DataFrames are like RDDs with schema information.</p>
+<p>HBase-Spark Connector (in HBase-Spark Module) leverages
+<a href="https://databricks.com/blog/2015/01/09/spark-sql-data-sources-api-unified-data-access-for-the-spark-platform.html">DataSource API</a>
+(<a href="https://issues.apache.org/jira/browse/SPARK-3247">SPARK-3247</a>)
+introduced in Spark-1.2.0, bridges the gap between simple HBase KV store and complex
+relational SQL queries and enables users to perform complex data analytical work
+on top of HBase using Spark. HBase Dataframe is a standard Spark Dataframe, and is able to
+interact with any other data sources such as Hive, Orc, Parquet, JSON, etc.
+HBase-Spark Connector applies critical techniques such as partition pruning, column pruning,
+predicate pushdown and data locality.</p>
+</div>
+<div class="paragraph">
+<p>To use HBase-Spark connector, users need to define the Catalog for the schema mapping
+between HBase and Spark tables, prepare the data and populate the HBase table,
+then load HBase DataFrame. After that, users can do integrated query and access records
+in HBase table with SQL query. Following illustrates the basic procedure.</p>
+</div>
+<div class="sect2">
+<h3 id="_define_catalog"><a class="anchor" href="#_define_catalog"></a>86.1. Define catalog</h3>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="scala">def catalog = s"""{
+�������|"table":{"namespace":"default", "name":"table1"},
+�������|"rowkey":"key",
+�������|"columns":{
+���������|"col0":{"cf":"rowkey", "col":"key", "type":"string"},
+���������|"col1":{"cf":"cf1", "col":"col1", "type":"boolean"},
+���������|"col2":{"cf":"cf2", "col":"col2", "type":"double"},
+���������|"col3":{"cf":"cf3", "col":"col3", "type":"float"},
+���������|"col4":{"cf":"cf4", "col":"col4", "type":"int"},
+���������|"col5":{"cf":"cf5", "col":"col5", "type":"bigint"},
+���������|"col6":{"cf":"cf6", "col":"col6", "type":"smallint"},
+���������|"col7":{"cf":"cf7", "col":"col7", "type":"string"},
+���������|"col8":{"cf":"cf8", "col":"col8", "type":"tinyint"}
+�������|}
+�����|}""".stripMargin</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Catalog defines a mapping between HBase and Spark tables. There are two critical parts of this catalog.
+One is the rowkey definition and the other is the mapping between table column in Spark and
+the column family and column qualifier in HBase. The above defines a schema for a HBase table
+with name as table1, row key as key and a number of columns (col1 <code>-</code> col8). Note that the rowkey
+also has to be defined in details as a column (col0), which has a specific cf (rowkey).</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_save_the_dataframe"><a class="anchor" href="#_save_the_dataframe"></a>86.2. Save the DataFrame</h3>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="scala">case class HBaseRecord(
+ col0: String,
+ col1: Boolean,
+ col2: Double,
+ col3: Float,
+ col4: Int, ������
+ col5: Long,
+ col6: Short,
+ col7: String,
+ col8: Byte)
+
+object HBaseRecord
+{ ������������������������������������������������������������������������������������������������������������
+ def apply(i: Int, t: String): HBaseRecord = {
+ val s = s"""row${"%03d".format(i)}""" ������
+ HBaseRecord(s,
+ i % 2 == 0,
+ i.toDouble,
+ i.toFloat, �
+ i,
+ i.toLong,
+ i.toShort, �
+ s"String$i: $t", �����
+ i.toByte)
+ }
+}
+
+val data = (0 to 255).map { i => �HBaseRecord(i, "extra")}
+
+sc.parallelize(data).toDF.write.options(
+�Map(HBaseTableCatalog.tableCatalog -> catalog, HBaseTableCatalog.newTable -> "5"))
+�.format("org.apache.hadoop.hbase.spark ")
+�.save()</code></pre>
+</div>
</div>
<div class="paragraph">
-<p>The HBase-Spark module includes support for Spark SQL and DataFrames, which allows
-you to write SparkSQL directly on HBase tables. In addition the HBase-Spark
-will push down query filtering logic to HBase.</p>
+<p><code>data</code> prepared by the user is a local Scala collection which has 256 HBaseRecord objects.
+<code>sc.parallelize(data)</code> function distributes <code>data</code> to form an RDD. <code>toDF</code> returns a DataFrame.
+<code>write</code> function returns a DataFrameWriter used to write the DataFrame to external storage
+systems (e.g. HBase here). Given a DataFrame with specified schema <code>catalog</code>, <code>save</code> function
+will create an HBase table with 5 regions and save the DataFrame inside.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_load_the_dataframe"><a class="anchor" href="#_load_the_dataframe"></a>86.3. Load the DataFrame</h3>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="scala">def withCatalog(cat: String): DataFrame = {
+ sqlContext
+ .read
+ .options(Map(HBaseTableCatalog.tableCatalog->cat))
+ .format("org.apache.hadoop.hbase.spark")
+ .load()
+}
+val df = withCatalog(catalog)</code></pre>
+</div>
</div>
<div class="paragraph">
-<p>In HBaseSparkConf, four parameters related to timestamp can be set. They are TIMESTAMP,
-MIN_TIMESTAMP, MAX_TIMESTAMP and MAX_VERSIONS respectively. Users can query records
-with different timestamps or time ranges with MIN_TIMESTAMP and MAX_TIMESTAMP.
-In the meantime, use concrete value instead of tsSpecified and oldMs in the examples below.</p>
+<p>In \u2018withCatalog\u2019 function, sqlContext is a variable of SQLContext, which is the entry point
+for working with structured data (rows and columns) in Spark.
+<code>read</code> returns a DataFrameReader that can be used to read data in as a DataFrame.
+<code>option</code> function adds input options for the underlying data source to the DataFrameReader,
+and <code>format</code> function specifies the input data source format for the DataFrameReader.
+The <code>load()</code> function loads input in as a DataFrame. The date frame <code>df</code> returned
+by <code>withCatalog</code> function could be used to access HBase table, such as 4.4 and 4.5.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_language_integrated_query"><a class="anchor" href="#_language_integrated_query"></a>86.4. Language Integrated Query</h3>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="scala">val s = df.filter(($"col0" <= "row050" && $"col0" > "row040") ||
+ $"col0" === "row005" ||
+ $"col0" <= "row005")
+ .select("col0", "col1", "col4")
+s.show</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>DataFrame can do various operations, such as join, sort, select, filter, orderBy and so on.
+<code>df.filter</code> above filters rows using the given SQL expression. <code>select</code> selects a set of columns:
+<code>col0</code>, <code>col1</code> and <code>col4</code>.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_sql_query"><a class="anchor" href="#_sql_query"></a>86.5. SQL Query</h3>
+<div class="listingblock">
+<div class="content">
+<pre class="CodeRay highlight"><code data-lang="scala">df.registerTempTable("table1")
+sqlContext.sql("select count(col1) from table1").show</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p><code>registerTempTable</code> registers <code>df</code> DataFrame as a temporary table using the table name <code>table1</code>.
+The lifetime of this temporary table is tied to the SQLContext that was used to create <code>df</code>.
+<code>sqlContext.sql</code> function allows the user to execute SQL queries.</p>
+</div>
</div>
+<div class="sect2">
+<h3 id="_others"><a class="anchor" href="#_others"></a>86.6. Others</h3>
<div class="exampleblock">
<div class="title">Example 52. Query with different timestamps</div>
<div class="content">
<div class="paragraph">
+<p>In HBaseSparkConf, four parameters related to timestamp can be set. They are TIMESTAMP,
+MIN_TIMESTAMP, MAX_TIMESTAMP and MAX_VERSIONS respectively. Users can query records with
+different timestamps or time ranges with MIN_TIMESTAMP and MAX_TIMESTAMP. In the meantime,
+use concrete value instead of tsSpecified and oldMs in the examples below.</p>
+</div>
+<div class="paragraph">
<p>The example below shows how to load df DataFrame with different timestamps.
tsSpecified is specified by the user.
HBaseTableCatalog defines the HBase and Relation relation schema.
@@ -18867,10 +19182,10 @@ writeCatalog defines catalog for the schema mapping.</p>
</div>
<div class="listingblock">
<div class="content">
-<pre>val df = sqlContext.read
+<pre class="CodeRay highlight"><code data-lang="scala">val df = sqlContext.read
.options(Map(HBaseTableCatalog.tableCatalog -> writeCatalog, HBaseSparkConf.TIMESTAMP -> tsSpecified.toString))
- .format("org.apache.hadoop.hbase.spark")
- .load()</pre>
+ .format("org.apache.hadoop.hbase.spark")
+ .load()</code></pre>
</div>
</div>
<div class="paragraph">
@@ -18879,11 +19194,11 @@ oldMs is specified by the user.</p>
</div>
<div class="listingblock">
<div class="content">
-<pre>val df = sqlContext.read
- .options(Map(HBaseTableCatalog.tableCatalog -> writeCatalog, HBaseSparkConf.MIN_TIMESTAMP -> "0",
+<pre class="CodeRay highlight"><code data-lang="scala">val df = sqlContext.read
+ .options(Map(HBaseTableCatalog.tableCatalog -> writeCatalog, HBaseSparkConf.MIN_TIMESTAMP -> "0",
HBaseSparkConf.MAX_TIMESTAMP -> oldMs.toString))
- .format("org.apache.hadoop.hbase.spark")
- .load()</pre>
+ .format("org.apache.hadoop.hbase.spark")
+ .load()</code></pre>
</div>
</div>
<div class="paragraph">
@@ -18891,162 +19206,149 @@ oldMs is specified by the user.</p>
</div>
<div class="listingblock">
<div class="content">
-<pre> df.registerTempTable("table")
- sqlContext.sql("select count(col1) from table").show</pre>
+<pre class="CodeRay highlight"><code data-lang="scala">df.registerTempTable("table")
+sqlContext.sql("select count(col1) from table").show</code></pre>
</div>
</div>
</div>
</div>
-<div class="sect2">
-<h3 id="_predicate_push_down"><a class="anchor" href="#_predicate_push_down"></a>86.1. Predicate Push Down</h3>
+<div class="exampleblock">
+<div class="title">Example 53. Native Avro support</div>
+<div class="content">
<div class="paragraph">
-<p>There are two examples of predicate push down in the HBase-Spark implementation.
-The first example shows the push down of filtering logic on the RowKey. HBase-Spark
-will reduce the filters on RowKeys down to a set of Get and/or Scan commands.</p>
-</div>
-<div class="admonitionblock note">
-<table>
-<tr>
-<td class="icon">
-<i class="fa icon-note" title="Note"></i>
-</td>
-<td class="content">
-The Scans are distributed scans, rather than a single client scan operation.
-</td>
-</tr>
-</table>
+<p>HBase-Spark Connector support different data formats like Avro, Jason, etc. The use case below
+shows how spark supports Avro. User can persist the Avro record into HBase directly. Internally,
+the Avro schema is converted to a native Spark Catalyst data type automatically.
+Note that both key-value parts in an HBase table can be defined in Avro format.</p>
</div>
<div class="paragraph">
-<p>If the query looks something like the following, the logic will push down and get
-the rows through 3 Gets and 0 Scans. We can do gets because all the operations
-are <code>equal</code> operations.</p>
+<p>1) Define catalog for the schema mapping:</p>
</div>
<div class="listingblock">
<div class="content">
-<pre class="CodeRay highlight"><code data-lang="sql"><span class="class">SELECT</span>
- KEY_FIELD,
- B_FIELD,
- A_FIELD
-<span class="keyword">FROM</span> hbaseTmp
-<span class="keyword">WHERE</span> (KEY_FIELD = <span class="string"><span class="delimiter">'</span><span class="content">get1</span><span class="delimiter">'</span></span> <span class="keyword">or</span> KEY_FIELD = <span class="string"><span class="delimiter">'</span><span class="content">get2</span><span class="delimiter">'</span></span> <span class="keyword">or</span> KEY_FIELD = <span class="string"><span class="delimiter">'</span><span class="content">get3</span><span class="delimiter">'</span></span>)</code></pre>
+<pre class="CodeRay highlight"><code data-lang="scala">def catalog = s"""{
+ |"table":{"namespace":"default", "name":"Avrotable"},
+ |"rowkey":"key",
+ |"columns":{
+ |"col0":{"cf":"rowkey", "col":"key", "type":"string"},
+ |"col1":{"cf":"cf1", "col":"col1", "type":"binary"}
+ |}
+ |}""".stripMargin</code></pre>
</div>
</div>
<div class="paragraph">
-<p>Now let’s look at an example where we will end up doing two scans on HBase.</p>
+<p><code>catalog</code> is a schema for a HBase table named <code>Avrotable</code>. row key as key and
+one column col1. The rowkey also has to be defined in details as a column (col0),
+which has a specific cf (rowkey).</p>
+</div>
+<div class="paragraph">
+<p>2) Prepare the Data:</p>
</div>
<div class="listingblock">
<div class="content">
-<pre class="CodeRay highlight"><code data-lang="sql"><span class="class">SELECT</span>
- KEY_FIELD,
- B_FIELD,
- A_FIELD
-<span class="keyword">FROM</span> hbaseTmp
-<span class="keyword">WHERE</span> KEY_FIELD < <span class="string"><span class="delimiter">'</span><span class="content">get2</span><span class="delimiter">'</span></span> <span class="keyword">or</span> KEY_FIELD > <span class="string"><span class="delimiter">'</span><span class="content">get3</span><span class="delimiter">'</span></span></code></pre>
+<pre class="CodeRay highlight"><code data-lang="scala"> object AvroHBaseRecord {
+ val schemaString =
+ s"""{"namespace": "example.avro",
+ | "type": "record", "name": "User",
+ | "fields": [
+ | {"name": "name", "type": "string"},
+ | {"name": "favorite_number", "type": ["int", "null"]},
+ | {"name": "favorite_color", "type": ["string", "null"]},
+ | {"name": "favorite_array", "type": {"type": "array", "items": "string"}},
+ | {"name": "favorite_map", "type": {"type": "map", "values": "int"}}
+ | ] }""".stripMargin
+
+ val avroSchema: Schema = {
+ val p = new Schema.Parser
+ p.parse(schemaString)
+ }
+
+ def apply(i: Int): AvroHBaseRecord = {
+ val user = new GenericData.Record(avroSchema);
+ user.put("name", s"name${"%03d".format(i)}")
+ user.put("favorite_number", i)
+ user.put("favorite_color", s"color${"%03d".format(i)}")
+ val favoriteArray = new GenericData.Array[String](2, avroSchema.getField("favorite_array").schema())
+ favoriteArray.add(s"number${i}")
+ favoriteArray.add(s"number${i+1}")
+ user.put("favorite_array", favoriteArray)
+ import collection.JavaConverters._
+ val favoriteMap = Map[String, Int](("key1" -> i), ("key2" -> (i+1))).asJava
+ user.put("favorite_map", favoriteMap)
+ val avroByte = AvroSedes.serialize(user, avroSchema)
+ AvroHBaseRecord(s"name${"%03d".format(i)}", avroByte)
+ }
+ }
+
+ val data = (0 to 255).map { i =>
+ AvroHBaseRecord(i)
+ }</code></pre>
</div>
</div>
<div class="paragraph">
-<p>In this example we will get 0 Gets and 2 Scans. One scan will load everything
-from the first row in the table until \u201cget2\u201d and the second scan will get
-everything from \u201cget3\u201d until the last row in the table.</p>
+<p><code>schemaString</code> is defined first, then it is parsed to get <code>avroSchema</code>. <code>avroSchema</code> is used to
+generate <code>AvroHBaseRecord</code>. <code>data</code> prepared by users is a local Scala collection
+which has 256 <code>AvroHBaseRecord</code> objects.</p>
</div>
<div class="paragraph">
-<p>The next query is a good example of having a good deal of range checks. However
-the ranges overlap. To the code will be smart enough to get the following data
-in a single scan that encompasses all the data asked by the query.</p>
+<p>3) Save DataFrame:</p>
</div>
<div class="listingblock">
<div class="content">
-<pre class="CodeRay highlight"><code data-lang="sql"><span class="class">SELECT</span>
- KEY_FIELD,
- B_FIELD,
- A_FIELD
-<span class="keyword">FROM</span> hbaseTmp
-<span class="keyword">WHERE</span>
- (KEY_FIELD >= <span class="string"><span class="delimiter">'</span><span class="content">get1</span><span class="delimiter">'</span></span> <span class="keyword">and</span> KEY_FIELD <= <span class="string"><span class="delimiter">'</span><span class="content">get3</span><span class="delimiter">'</span></span>) <span class="keyword">or</span>
- (KEY_FIELD > <span class="string"><span class="delimiter">'</span><span class="content">get3</span><span class="delimiter">'</span></span> <span class="keyword">and</span> KEY_FIELD <= <span class="string"><span class="delimiter">'</span><span class="content">get5</span><span class="delimiter">'</span></span>)</code></pre>
+<pre class="CodeRay highlight"><code data-lang="scala"> sc.parallelize(data).toDF.write.options(
+ Map(HBaseTableCatalog.tableCatalog -> catalog, HBaseTableCatalog.newTable -> "5"))
+ .format("org.apache.spark.sql.execution.datasources.hbase")
+ .save()</code></pre>
</div>
</div>
<div class="paragraph">
-<p>The second example of push down functionality offered by the HBase-Spark module
-is the ability to push down filter logic for column and cell fields. Just like
-the RowKey logic, all query logic will be consolidated into the minimum number
-of range checks and equal checks by sending a Filter object along with the Scan
-with information about consolidated push down predicates</p>
+<p>Given a data frame with specified schema <code>catalog</code>, above will create an HBase table with 5
+regions and save the data frame inside.</p>
</div>
-<div class="exampleblock">
-<div class="title">Example 53. SparkSQL Code Example</div>
-<div class="content">
<div class="paragraph">
-<p>This example shows how we can interact with HBase with SQL.</p>
+<p>4) Load the DataFrame</p>
</div>
<div class="listingblock">
<div class="content">
-<pre class="CodeRay highlight"><code data-lang="scala">val sc = new SparkContext("local", "test")
-val config = new HBaseConfiguration()
-
-new HBaseContext(sc, TEST_UTIL.getConfiguration)
-val sqlContext = new SQLContext(sc)
+<pre class="CodeRay highlight"><code data-lang="scala">def avroCatalog = s"""{
+ |"table":{"namespace":"default", "name":"avrotable"},
+ |"rowkey":"key",
+ |"columns":{
+ |"col0":{"cf":"rowkey", "col":"key", "type":"string"},
+ |"col1":{"cf":"cf1", "col":"col1", "avro":"avroSchema"}
+ |}
+ |}""".stripMargin
-df = sqlContext.load("org.apache.hadoop.hbase.spark",
- Map("hbase.columns.mapping" ->
- "KEY_FIELD STRING :key, A_FIELD STRING c:a, B_FIELD STRING c:b",
- "hbase.table" -> "t1"))
-
-df.registerTempTable("hbaseTmp")
-
-val results = sqlContext.sql("SELECT KEY_FIELD, B_FIELD FROM hbaseTmp " +
- "WHERE " +
- "(KEY_FIELD = 'get1' and B_FIELD < '3') or " +
- "(KEY_FIELD >= 'get3' and B_FIELD = '8')").take(5)</code></pre>
+ def withCatalog(cat: String): DataFrame = {
+ sqlContext
+ .read
+ .options(Map("avroSchema" -> AvroHBaseRecord.schemaString, HBaseTableCatalog.tableCatalog -> avroCatalog))
+ .format("org.apache.spark.sql.execution.datasources.hbase")
+ .load()
+ }
+ val df = withCatalog(catalog)</code></pre>
</div>
</div>
<div class="paragraph">
-<p>There are three major parts of this example that deserve explaining.</p>
-</div>
-<div class="dlist">
-<dl>
-<dt class="hdlist1">The sqlContext.load function</dt>
-<dd>
-<p>In the sqlContext.load function we see two
-parameters. The first of these parameters is pointing Spark to the HBase
-DefaultSource class that will act as the interface between SparkSQL and HBase.</p>
-</dd>
-<dt class="hdlist1">A map of key value pairs</dt>
-<dd>
-<p>In this example we have two keys in our map, <code>hbase.columns.mapping</code> and
-<code>hbase.table</code>. The <code>hbase.table</code> directs SparkSQL to use the given HBase table.
-The <code>hbase.columns.mapping</code> key give us the logic to translate HBase columns to
-SparkSQL columns.</p>
-<div class="paragraph">
-<p>The <code>hbase.columns.mapping</code> is a string that follows the following format</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre class="CodeRay highlight"><code data-lang="scala">(SparkSQL.ColumnName) (SparkSQL.ColumnType) (HBase.ColumnFamily):(HBase.Qualifier)</code></pre>
-</div>
+<p>In <code>withCatalog</code> function, <code>read</code> returns a DataFrameReader that can be used to read data in as a DataFrame.
+The <code>option</code> function adds input options for the underlying data source to the DataFrameReader.
+There are two options: one is to set <code>avroSchema</code> as <code>AvroHBaseRecord.schemaString</code>, and one is to
+set <code>HBaseTableCatalog.tableCatalog</code> as <code>avroCatalog</code>. The <code>load()</code> function loads input in as a DataFrame.
+The date frame <code>df</code> returned by <code>withCatalog</code> function could be used to access the HBase table.</p>
</div>
<div class="paragraph">
-<p>In the example below we see the definition of three fields. Because KEY_FIELD has
-no ColumnFamily, it is the RowKey.</p>
+<p>5) SQL Query</p>
</div>
<div class="listingblock">
<div class="content">
-<pre>KEY_FIELD STRING :key, A_FIELD STRING c:a, B_FIELD STRING c:b</pre>
-</div>
+<pre class="CodeRay highlight"><code data-lang="scala"> df.registerTempTable("avrotable")
+ val c = sqlContext.sql("select count(1) from avrotable").</code></pre>
</div>
-</dd>
-<dt class="hdlist1">The registerTempTable function</dt>
-<dd>
-<p>This is a SparkSQL function that allows us now to be free of Scala when accessing
-our HBase table directly with SQL with the table name of "hbaseTmp".</p>
-</dd>
-</dl>
</div>
<div class="paragraph">
-<p>The last major point to note in the example is the <code>sqlContext.sql</code> function, which
-allows the user to ask their questions in SQL which will be pushed down to the
-DefaultSource code in the HBase-Spark module. The result of this command will be
-a DataFrame with the Schema of KEY_FIELD and B_FIELD.</p>
+<p>After loading df DataFrame, users can query data. registerTempTable registers df DataFrame
+as a temporary table using the table name avrotable. <code>sqlContext.sql</code> function allows the
+user to execute SQL queries.</p>
</div>
</div>
</div>
@@ -19678,7 +19980,7 @@ and <code>salaryDet</code>, containing personal and salary details. Below is the
of the <code>users</code> table.</p>
</div>
<table class="tableblock frame-all grid-all spread">
-<caption class="title">Table 11. Users Table</caption>
+<caption class="title">Table 17. Users Table</caption>
<colgroup>
<col style="width: 14%;">
<col style="width: 14%;">
@@ -28353,7 +28655,7 @@ End-of-life releases are not included in this list.
</table>
</div>
<table class="tableblock frame-all grid-all spread">
-<caption class="title">Table 12. Release Managers</caption>
+<caption class="title">Table 18. Release Managers</caption>
<colgroup>
<col style="width: 50%;">
<col style="width: 50%;">
@@ -30478,7 +30780,7 @@ The following cheat sheet is included for your reference. More nuanced and compr
is available at <a href="http://asciidoctor.org/docs/user-manual/" class="bare">http://asciidoctor.org/docs/user-manual/</a>.</p>
</div>
<table class="tableblock frame-all grid-all spread">
-<caption class="title">Table 13. AsciiDoc Cheat Sheet</caption>
+<caption class="title">Table 19. AsciiDoc Cheat Sheet</caption>
<colgroup>
<col style="width: 33%;">
<col style="width: 33%;">
@@ -31529,7 +31831,7 @@ In case the table goes out of date, the unit tests which check for accuracy of p
</dl>
</div>
<table class="tableblock frame-all grid-all spread">
-<caption class="title">Table 14. ACL Matrix</caption>
+<caption class="title">Table 20. ACL Matrix</caption>
<colgroup>
<col style="width: 33%;">
<col style="width: 33%;">
@@ -32991,7 +33293,7 @@ Note that the size of the trailer is different depending on the version, so it i
However, the version is always stored as the last four-byte integer in the file.</p>
</div>
<table class="tableblock frame-all grid-all spread">
-<caption class="title">Table 15. Differences between HFile Versions 1 and 2</caption>
+<caption class="title">Table 21. Differences between HFile Versions 1 and 2</caption>
<colgroup>
<col style="width: 50%;">
<col style="width: 50%;">
http://git-wip-us.apache.org/repos/asf/hbase-site/blob/975096b1/bulk-loads.html
----------------------------------------------------------------------
diff --git a/bulk-loads.html b/bulk-loads.html
index 6b09bad..b4c1ea7 100644
--- a/bulk-loads.html
+++ b/bulk-loads.html
@@ -7,7 +7,7 @@
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
- <meta name="Date-Revision-yyyymmdd" content="20160712" />
+ <meta name="Date-Revision-yyyymmdd" content="20160714" />
<meta http-equiv="Content-Language" content="en" />
<title>Apache HBase –
Bulk Loads in Apache HBase (TM)
@@ -305,7 +305,7 @@ under the License. -->
<a href="http://www.apache.org/">The Apache Software Foundation</a>.
All rights reserved.
- <li id="publishDate" class="pull-right">Last Published: 2016-07-12</li>
+ <li id="publishDate" class="pull-right">Last Published: 2016-07-14</li>
</p>
</div>