You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@jena.apache.org by bu...@apache.org on 2014/11/27 12:54:09 UTC

svn commit: r930757 - in /websites/staging/jena/trunk/content: ./ documentation/hadoop/index.html documentation/hadoop/io.html

Author: buildbot
Date: Thu Nov 27 11:54:08 2014
New Revision: 930757

Log:
Staging update by buildbot for jena

Added:
    websites/staging/jena/trunk/content/documentation/hadoop/io.html
Modified:
    websites/staging/jena/trunk/content/   (props changed)
    websites/staging/jena/trunk/content/documentation/hadoop/index.html

Propchange: websites/staging/jena/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Thu Nov 27 11:54:08 2014
@@ -1 +1 @@
-1642102
+1642121

Modified: websites/staging/jena/trunk/content/documentation/hadoop/index.html
==============================================================================
--- websites/staging/jena/trunk/content/documentation/hadoop/index.html (original)
+++ websites/staging/jena/trunk/content/documentation/hadoop/index.html Thu Nov 27 11:54:08 2014
@@ -149,6 +149,9 @@ you to start writing Hadoop based applic
 often been limited and task specific.  These libraries aim to be as generic as possible and provide the necessary
 infrastructure that enables developers to create their application specific logic without worrying about the
 underlying plumbing.</p>
+<h2 id="beta">Beta</h2>
+<p>These modules are currently considered to be in a <strong>Beta</strong> state, they have been under active development for about a year but have not yet been widely deployed and may contain as yet undiscovered bugs.</p>
+<p>Please see the <a href="../help_and_support/bugs_and_suggestions.html">How to Report a Bug</a> page for how to report any bugs you may encounter.</p>
 <h2 id="documentation">Documentation</h2>
 <ul>
 <li><a href="#overview">Overview</a></li>

Added: websites/staging/jena/trunk/content/documentation/hadoop/io.html
==============================================================================
--- websites/staging/jena/trunk/content/documentation/hadoop/io.html (added)
+++ websites/staging/jena/trunk/content/documentation/hadoop/io.html Thu Nov 27 11:54:08 2014
@@ -0,0 +1,175 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <title>Apache Jena - RDF Tools for Apache Hadoop - IO API</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <meta name="viewport" content="width=device-width, initial-scale=1.0">
+
+  <link href="/css/bootstrap.min.css" rel="stylesheet" media="screen">
+  <link href="/css/bootstrap-extension.css" rel="stylesheet" type="text/css">
+  <link href="/css/jena.css" rel="stylesheet" type="text/css">
+  <link rel="shortcut icon" href="/images/favicon.ico" />
+  
+  <script src="https://code.jquery.com/jquery-2.0.3.min.js"></script>
+  <script src="/js/jena-navigation.js" type="text/javascript"></script>
+  <script src="/js/bootstrap.min.js" type="text/javascript"></script>
+  <script src="/js/breadcrumbs.js" type="text/javascript"></script>
+
+  <script src="/js/improve.js" type="text/javascript"></script>
+
+  
+  <!-- Uncomment to enable code coloring <link href="/css/codehilite.css" rel="stylesheet" type="text/css"> -->
+
+</head>
+
+<body>
+
+
+
+<nav class="navbar navbar-default" role="navigation">
+<div class="container">
+  <div class="navbar-header">
+  
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-ex1-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="/index.html">
+    <img class="logo-menu" src="/images/jena-logo/jena-logo-notext-small.png" alt="jena logo">Apache Jena</a>
+  </div>
+ 
+  <div class="collapse navbar-collapse navbar-ex1-collapse">
+    <ul class="nav navbar-nav">
+              <li id="homepage"><a href="/index.html"><span class="glyphicon glyphicon-home"></span> Home</a></li>
+              <li id="download"><a href="/download/index.cgi"><span class="glyphicon glyphicon-download-alt"></span> Download</a></li>
+              <li class="dropdown">
+                <a href="#" class="dropdown-toggle" data-toggle="dropdown"><span class="glyphicon glyphicon-book"></span> Learn <b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li class="dropdown-header">Tutorials</li>
+                  <li><a href="/tutorials/index.html">Overview</a></li>
+                  <li><a href="/tutorials/rdf_api.html">RDF core API tutorial</a></li>
+                  <li><a href="/tutorials/sparql.html">SPARQL tutorial</a></li>
+                  <li><a href="/documentation/query/manipulating_sparql_using_arq.html">Manipulating SPARQL using ARQ</a></li>
+                  <li><a href="/tutorials/using_jena_with_eclipse.html">Using Jena with Eclipse</a></li>
+                  <li><a href="/documentation/notes/index.html">How-To's</a></li>
+                  <li class="divider"></li>
+                  <li class="dropdown-header">References</li>
+                  <li><a href="/documentation/index.html">Overview</a></li>
+                  <li><a href="/documentation/javadoc/">Javadoc</a></li>
+                  <li><a href="/documentation/rdf/index.html">RDF API</a></li>
+                  <li><a href="/documentation/io/">RDF I/O</a></li>
+                  <li><a href="/documentation/query/index.html">ARQ (SPARQL)</a></li>
+                  <li><a href="/documentation/query/text-query.html">Text Search</a></li>
+                  <li><a href="/documentation/tdb/index.html">TDB</a></li>
+		  <li><a href="/documentation/sdb/index.html">SDB</a></li>
+		  <li><a href="/documentation/jdbc/index.html">SPARQL over JDBC</a></li>
+		  <li><a href="/documentation/security/index.html">Security</a></li>
+                  <li><a href="/documentation/serving_data/index.html">Fuseki</a></li>
+                  <li><a href="/documentation/assembler/index.html">Assembler</a></li>
+                  <li><a href="/documentation/ontology/">Ontology API</a></li>
+                  <li><a href="/documentation/inference/index.html">Inference API</a></li>
+                  <li><a href="/documentation/tools/index.html">Command-line tools</a></li>
+                  <li><a href="/documentation/extras/index.html">Extras</a></li>
+                </ul>
+              </li>
+
+              <li class="drop down">
+                <a href="#" class="dropdown-toggle" data-toggle="dropdown"><span class="glyphicon glyphicon-book"></span> Javadoc <b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/documentation/javadoc/jena/">Jena Core</a></li>
+                  <li><a href="/documentation/javadoc/arq/">ARQ</a></li>
+                  <li><a href="/documentation/javadoc/tdb/">TDB</a></li>
+                  <li><a href="/documentation/javadoc/text/">Text Search</a></li>
+                  <li><a href="/documentation/javadoc/spatial/">Spatial Search</a></li>
+                  <li><a href="/documentation/javadoc/security/">Security</a></li>
+                  <li><a href="/documentation/javadoc/jdbc/">JDBC</a></li>
+                  <li><a href="/documentation/javadoc/sdb/">SDB</a></li>
+                  <li><a href="/documentation/javadoc/fuseki/">Fuseki</a></li>
+                </ul>
+              </li>
+
+              <li id="ask"><a href="/help_and_support/index.html"><span class="glyphicon glyphicon-question-sign"></span> Ask</a></li>
+              
+              <li class="dropdown">
+                <a href="#" class="dropdown-toggle" data-toggle="dropdown"><span class="glyphicon glyphicon-bullhorn"></span> Get involved <b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/getting_involved/index.html">Contribute</a></li>
+                  <li><a href="/help_and_support/bugs_and_suggestions.html">Report a bug</a></li>
+                  <li class="divider"></li>
+                  <li class="dropdown-header">Project</li>
+                  <li><a href="/about_jena/about.html">About Jena</a></li>
+                  <li><a href="/about_jena/roadmap.html">Roadmap</a></li>
+                  <li><a href="/about_jena/architecture.html">Architecture</a></li>
+                  <li><a href="/about_jena/team.html">Project team</a></li>
+                  <li><a href="/about_jena/contributions.html">Related projects</a></li>
+                  <li class="divider"></li>
+                  <li class="dropdown-header">ASF</li>
+                  <li><a href="http://www.apache.org/">Apache Software Foundation</a></li>
+                  <li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+                  <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+                  <li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
+                  <li><a href="http://www.apache.org/security/">Security</a></li>
+                </ul>
+              </li>
+
+              <li id="edit"><a href="javascript:improveThisPage(location.href);" title="Improve this Page (Use username anonymous and empty password)"><span class="glyphicon glyphicon-pencil"></span> Improve this Page</a></li>   
+    </ul>
+  </div>
+</div>
+</nav>
+
+
+<div class="container">
+	<div class="row">
+	<div class="col-md-12">
+	<div id="breadcrumbs"></div>
+	<h1 class="title">RDF Tools for Apache Hadoop - IO API</h1>
+  <p>The IO API provides support for reading and writing RDF within Hadoop applications.  This is done by providing <code>InputFormat</code> and <code>OutputFormat</code> implementations that cover all the RDF serialisations that Jena supports.</p>
+<h1 id="background-on-hadoop-io">Background on Hadoop IO</h1>
+<p>If you are already familiar with the Hadoop IO paradigm then please skip this section, if not please read as otherwise some of the later information will not make much sense.</p>
+<p>Hadoop applications and particularly Map/Reduce exploit horizontally scalability by dividing input data up into <em>splits</em> where each <em>split</em> represents a portion of the input data that can be read in <em>isolation</em> from the other pieces.  This <em>isolation</em> property is very important to understand, if a file format requires that the entire file be read sequentially in order to properly interpret it then it cannot be split and must be read as a whole.</p>
+<p>Therefore depending on the file formats used for your input data you may not get as much parallel performance because Hadoop's ability to <em>split</em> the input data may be limited.</p>
+<p>In some cases there are file formats that may be processed in multiple ways i.e. you can <em>split</em> them into pieces or you can process them as a whole.  Which approach you wish to use will depend on whether you have a single file to process or many files to process.  In the case of many files processing files as a whole may provide better overall throughput than processing them as chunks.  However your mileage may vary especially if your input data has many files of uneven size.</p>
+<h2 id="compressed-io">Compressed IO</h2>
+<p>Hadoop natively provides support for compressed input and output providing your Hadoop cluster is appropriately configured.  The advantage of compressing the input/output data is that it means there is less IO workload on the cluster however this comes with the disadvantage that most compression formats block Hadoop's ability to <em>split</em> up the input.</p>
+<h1 id="rdf-io-in-hadoop">RDF IO in Hadoop</h1>
+<p>There are a wide range of RDF serialisations supported by ARQ, please see the <a href="../io/">RDF IO</a> for an overview of the formats that Jena supports.  One of the difficulties posed when wrapping these for Hadoop IO is that the formats have very different properties in terms of our ability to <em>split</em> them into distinct chunks for Hadoop to</p>
+  </div>
+</div>
+
+</div><!--/.container -->
+
+    <footer class="footer">
+      <div class="container">
+        <p>Copyright &copy; 2011&ndash;2014 The Apache Software Foundation, Licensed under
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+        </p>
+        <p>
+        Apache Jena, Jena, the Apache Jena project logo,
+        Apache and the Apache feather logos are trademarks of The Apache Software Foundation.
+        </p>
+      </div>
+  </footer>
+      
+
+</body>
+</html>