You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@jena.apache.org by bu...@apache.org on 2014/11/26 18:05:30 UTC

svn commit: r930613 - in /websites/staging/jena/trunk/content: ./ documentation/hadoop/common.html

Author: buildbot
Date: Wed Nov 26 17:05:30 2014
New Revision: 930613

Log:
Staging update by buildbot for jena

Added:
    websites/staging/jena/trunk/content/documentation/hadoop/common.html
Modified:
    websites/staging/jena/trunk/content/   (props changed)

Propchange: websites/staging/jena/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Wed Nov 26 17:05:30 2014
@@ -1 +1 @@
-1641851
+1641857

Added: websites/staging/jena/trunk/content/documentation/hadoop/common.html
==============================================================================
--- websites/staging/jena/trunk/content/documentation/hadoop/common.html (added)
+++ websites/staging/jena/trunk/content/documentation/hadoop/common.html Wed Nov 26 17:05:30 2014
@@ -0,0 +1,186 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <title>Apache Jena - RDF Tools for Apache Hadoop - Common API</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <meta name="viewport" content="width=device-width, initial-scale=1.0">
+
+  <link href="/css/bootstrap.min.css" rel="stylesheet" media="screen">
+  <link href="/css/bootstrap-extension.css" rel="stylesheet" type="text/css">
+  <link href="/css/jena.css" rel="stylesheet" type="text/css">
+  <link rel="shortcut icon" href="/images/favicon.ico" />
+  
+  <script src="https://code.jquery.com/jquery-2.0.3.min.js"></script>
+  <script src="/js/jena-navigation.js" type="text/javascript"></script>
+  <script src="/js/bootstrap.min.js" type="text/javascript"></script>
+  <script src="/js/breadcrumbs.js" type="text/javascript"></script>
+
+  
+  <!-- Uncomment to enable code coloring <link href="/css/codehilite.css" rel="stylesheet" type="text/css"> -->
+
+</head>
+
+<body>
+
+
+
+<nav class="navbar navbar-default" role="navigation">
+<div class="container">
+  <div class="navbar-header">
+  
+        <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-ex1-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+ 	<a class="navbar-brand" href="/index.html">
+		<img class="logo-menu" src="/images/jena-logo/jena-logo-notext-small.png" alt="jena logo">Apache Jena</a>
+  </div>
+ 
+
+
+  <div class="collapse navbar-collapse navbar-ex1-collapse">
+    <ul class="nav navbar-nav">
+
+
+
+              <li id="homepage"><a href="/index.html"><span class="glyphicon glyphicon-home"></span> Home</a></li>
+              <li id="download"><a href="/download/index.cgi"><span class="glyphicon glyphicon-download-alt"></span> Download</a></li>
+              <li class="dropdown">
+                <a href="#" class="dropdown-toggle" data-toggle="dropdown"><span class="glyphicon glyphicon-book"></span> Learn <b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li class="dropdown-header">Tutorials</li>
+                  <li><a href="/tutorials/index.html">Overview</a></li>
+                  <li><a href="/tutorials/rdf_api.html">RDF core API tutorial</a></li>
+                  <li><a href="/tutorials/sparql.html">SPARQL tutorial</a></li>
+                  <li><a href="/documentation/query/manipulating_sparql_using_arq.html">Manipulating SPARQL using ARQ</a></li>
+                  <li><a href="/tutorials/using_jena_with_eclipse.html">Using Jena with Eclipse</a></li>
+                  <li><a href="/documentation/notes/index.html">How-To's</a></li>
+                  <li class="divider"></li>
+                  <li class="dropdown-header">References</li>
+                  <li><a href="/documentation/index.html">Overview</a></li>
+                  <li><a href="/documentation/javadoc/">Javadoc</a></li>
+                  <li><a href="/documentation/rdf/index.html">RDF API</a></li>
+                  <li><a href="/documentation/io/">RDF I/O</a></li>
+                  <li><a href="/documentation/query/index.html">ARQ (SPARQL)</a></li>
+                  <li><a href="/documentation/query/text-query.html">Text Search</a></li>
+                  <li><a href="/documentation/tdb/index.html">TDB</a></li>
+		  <li><a href="/documentation/sdb/index.html">SDB</a></li>
+		  <li><a href="/documentation/jdbc/index.html">SPARQL over JDBC</a></li>
+		  <li><a href="/documentation/security/index.html">Security</a></li>
+                  <li><a href="/documentation/serving_data/index.html">Fuseki</a></li>
+                  <li><a href="/documentation/assembler/index.html">Assembler</a></li>
+                  <li><a href="/documentation/ontology/">Ontology API</a></li>
+                  <li><a href="/documentation/inference/index.html">Inference API</a></li>
+                  <li><a href="/documentation/tools/index.html">Command-line tools</a></li>
+                  <li><a href="/documentation/extras/index.html">Extras</a></li>
+                </ul>
+              </li>
+
+              <li class="drop down">
+                <a href="#" class="dropdown-toggle" data-toggle="dropdown"><span class="glyphicon glyphicon-book"></span> Javadoc <b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/documentation/javadoc/jena/">Jena Core</a></li>
+                  <li><a href="/documentation/javadoc/arq/">ARQ</a></li>
+                  <li><a href="/documentation/javadoc/tdb/">TDB</a></li>
+                  <li><a href="/documentation/javadoc/text/">Text Search</a></li>
+                  <li><a href="/documentation/javadoc/spatial/">Spatial Search</a></li>
+                  <li><a href="/documentation/javadoc/security/">Security</a></li>
+                  <li><a href="/documentation/javadoc/jdbc/">JDBC</a></li>
+                  <li><a href="/documentation/javadoc/sdb/">SDB</a></li>
+                  <li><a href="/documentation/javadoc/fuseki/">Fuseki</a></li>
+                </ul>
+              </li>
+
+              <li id="ask"><a href="/help_and_support/index.html"><span class="glyphicon glyphicon-question-sign"></span> Ask</a></li>
+              
+              <li class="dropdown">
+                <a href="#" class="dropdown-toggle" data-toggle="dropdown"><span class="glyphicon glyphicon-bullhorn"></span> Get involved <b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/getting_involved/index.html">Contribute</a></li>
+                  <li><a href="/help_and_support/bugs_and_suggestions.html">Report a bug</a></li>
+                  <li class="divider"></li>
+                  <li class="dropdown-header">Project</li>
+                  <li><a href="/about_jena/about.html">About Jena</a></li>
+                  <li><a href="/about_jena/roadmap.html">Roadmap</a></li>
+                  <li><a href="/about_jena/architecture.html">Architecture</a></li>
+                  <li><a href="/about_jena/team.html">Project team</a></li>
+                  <li><a href="/about_jena/contributions.html">Related projects</a></li>
+                  <li class="divider"></li>
+                  <li class="dropdown-header">ASF</li>
+                  <li><a href="http://www.apache.org/">Apache Software Foundation</a></li>
+                  <li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+                  <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+                  <li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
+                  <li><a href="http://www.apache.org/security/">Security</a></li>
+                </ul>
+              </li>
+
+   
+    </ul>
+  </div>
+</div>
+</nav>
+
+
+<div class="container">
+	<div class="row">
+	<div class="col-md-12">
+	<div id="breadcrumbs"></div>
+	<h1 class="title">RDF Tools for Apache Hadoop - Common API</h1>
+  <p>The Common API provides the basic data model for representing RDF data within Hadoop applications.  This primarily takes the form of <code>Writable</code> implementations and the necessary machinery to efficiently serialise and deserialise these.</p>
+<p>Currently we represent the three main RDF primitives - Nodes, Triples and Quads - though in future a wider range of primitives may be supported if we receive contributions to implement them.</p>
+<h1 id="rdf-primitives">RDF Primitives</h1>
+<h2 id="nodes">Nodes</h2>
+<p>The <code>Writable</code> type for nodes is predictably enough called <code>NodeWritable</code> and it implements the <code>WritableComparable</code> interface which means it can be used as both a key and/or value in Map/Reduce.  In standard Hadoop style a <code>get()</code> method returns the actual value as a Jena <code>Node</code> instance while a corresponding <code>set()</code> method allows the value to be set.</p>
+<p>Note that nodes are lazily converted to and from the underlying binary representation so there is minimal overhead if you create a <code>NodeWritable</code> instance that does not actually ever get read/written.</p>
+<p><code>NodeWritable</code> supports and automatically registers itself for Hadoop's <a href="https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/io/WritableComparator.html"><code>WritableComparator</code></a> mechanism which allows it to provide high efficiency binary comparisons on nodes which helps reduce phases run faster by avoiding unnecessary deserialisation into POJOs.</p>
+<p>However the downside of this is that the sort order for nodes may not be as natural as the sort order using POJOs or when sorting with SPARQL.  Ultimately this is a performance trade off and in our experiments the benefits far outweigh the lack of a more natural sort order.</p>
+<h2 id="triples">Triples</h2>
+<p>Again the <code>Writable</code> type for nodes is simply called <code>TripleWritable</code> and it also implements the <code>WritableComparable</code> interface meaning it may be used as both a key and/or value.  Again the standard Hadoop conventions of a <code>get()</code> and <code>set()</code> method to get/set the value as a Jena <code>Triple</code> are followed.</p>
+<p>Like the other primitives it is lazily converted to and from the underlying binary representations and it also supports &amp; registers itself for Hadoop's <code>WritableComparator</code> mechanism.</p>
+<h2 id="quads">Quads</h2>
+<p>Finally the <code>Writable</code> type for quads is again simply called <code>QuadWritable</code> and it implements the <code>WritableComparable</code> interface making it usable as both a key and/or value.  As per the other primitives standard Hadoop conventions of a <code>get()</code> and <code>set()</code> method are provided to get/set the value as a Jena <code>Quad</code>.</p>
+<p>Like the other primitives it is lazily converted to and from the underlying binary representations and it also supports &amp; registers itself for Hadoop's <code>WritableComparator</code> mechanism.</p>
+<h2 id="arbitrary-sized-tuples">Arbitrary sized tuples</h2>
+<p>In some cases you may have data that is RDF like but not itself RDF or that is a mix of triples and quads in which case you may wish to use the <code>NodeTupleWritable</code>.  This is used to represent an arbitrarily sized tuple consisting of zero or more <code>Node</code> instances, there is no restriction on the number of nodes per tuple and no requirement that tuple data be uniform.</p>
+<p>Like the other primitives it implements <code>WritableComparable</code> so can be used as a key and/or value.  However this primitive does not support binary comparisons meaning it may not perform as well as using the other primitives.</p>
+<p>In this case the <code>get()</code> and <code>set()</code> methods get/set a <code>Tuple&lt;Node&gt;</code> instance which is a convenience container class provided by ARQ.  Currently the implementation does not support lazy conversion so the full <code>Tuple&lt;Node&gt;</code> is reconstructed as soon as an <code>NodeTupleWritable</code> instance is deserialised.</p>
+  </div>
+</div>
+
+</div><!--/.container -->
+
+    <footer class="footer">
+      <div class="container">
+        <p>Copyright &copy; 2011&ndash;2014 The Apache Software Foundation, Licensed under
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+        </p>
+        <p>
+        Apache Jena, Jena, the Apache Jena project logo,
+        Apache and the Apache feather logos are trademarks of The Apache Software Foundation.
+        </p>
+      </div>
+  </footer>
+      
+
+</body>
+</html>