You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@jena.apache.org by bu...@apache.org on 2015/01/06 17:08:24 UTC

svn commit: r935267 - in /websites/staging/jena/trunk/content: ./ documentation/hadoop/mapred.html

Author: buildbot
Date: Tue Jan  6 16:08:23 2015
New Revision: 935267

Log:
Staging update by buildbot for jena

Added:
    websites/staging/jena/trunk/content/documentation/hadoop/mapred.html
Modified:
    websites/staging/jena/trunk/content/   (props changed)

Propchange: websites/staging/jena/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Tue Jan  6 16:08:23 2015
@@ -1 +1 @@
-1649563
+1649852

Added: websites/staging/jena/trunk/content/documentation/hadoop/mapred.html
==============================================================================
--- websites/staging/jena/trunk/content/documentation/hadoop/mapred.html (added)
+++ websites/staging/jena/trunk/content/documentation/hadoop/mapred.html Tue Jan  6 16:08:23 2015
@@ -0,0 +1,233 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <title>Apache Jena - Apache Jena Elephas - Map/Reduce API</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <meta name="viewport" content="width=device-width, initial-scale=1.0">
+
+  <link href="/css/bootstrap.min.css" rel="stylesheet" media="screen">
+  <link href="/css/bootstrap-extension.css" rel="stylesheet" type="text/css">
+  <link href="/css/jena.css" rel="stylesheet" type="text/css">
+  <link rel="shortcut icon" href="/images/favicon.ico" />
+  
+  <script src="https://code.jquery.com/jquery-2.0.3.min.js"></script>
+  <script src="/js/jena-navigation.js" type="text/javascript"></script>
+  <script src="/js/bootstrap.min.js" type="text/javascript"></script>
+  <script src="/js/breadcrumbs.js" type="text/javascript"></script>
+
+  <script src="/js/improve.js" type="text/javascript"></script>
+
+  
+  <!-- Uncomment to enable code coloring <link href="/css/codehilite.css" rel="stylesheet" type="text/css"> -->
+
+</head>
+
+<body>
+
+
+
+<nav class="navbar navbar-default" role="navigation">
+<div class="container">
+  <div class="navbar-header">
+  
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-ex1-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="/index.html">
+    <img class="logo-menu" src="/images/jena-logo/jena-logo-notext-small.png" alt="jena logo">Apache Jena</a>
+  </div>
+ 
+  <div class="collapse navbar-collapse navbar-ex1-collapse">
+    <ul class="nav navbar-nav">
+              <li id="homepage"><a href="/index.html"><span class="glyphicon glyphicon-home"></span> Home</a></li>
+              <li id="download"><a href="/download/index.cgi"><span class="glyphicon glyphicon-download-alt"></span> Download</a></li>
+              <li class="dropdown">
+                <a href="#" class="dropdown-toggle" data-toggle="dropdown"><span class="glyphicon glyphicon-book"></span> Learn <b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li class="dropdown-header">Tutorials</li>
+                  <li><a href="/tutorials/index.html">Overview</a></li>
+                  <li><a href="/tutorials/rdf_api.html">RDF core API tutorial</a></li>
+                  <li><a href="/tutorials/sparql.html">SPARQL tutorial</a></li>
+                  <li><a href="/documentation/query/manipulating_sparql_using_arq.html">Manipulating SPARQL using ARQ</a></li>
+                  <li><a href="/tutorials/using_jena_with_eclipse.html">Using Jena with Eclipse</a></li>
+                  <li><a href="/documentation/notes/index.html">How-To's</a></li>
+                  <li class="divider"></li>
+                  <li class="dropdown-header">References</li>
+                  <li><a href="/documentation/index.html">Overview</a></li>
+                  <li><a href="/documentation/javadoc/">Javadoc</a></li>
+                  <li><a href="/documentation/rdf/index.html">RDF API</a></li>
+                  <li><a href="/documentation/io/">RDF I/O</a></li>
+                  <li><a href="/documentation/query/index.html">ARQ (SPARQL)</a></li>
+                  <li><a href="/documentation/query/text-query.html">Text Search</a></li>
+                  <li><a href="/documentation/tdb/index.html">TDB</a></li>
+		  <li><a href="/documentation/sdb/index.html">SDB</a></li>
+		  <li><a href="/documentation/jdbc/index.html">SPARQL over JDBC</a></li>
+		  <li><a href="/documentation/security/index.html">Security</a></li>
+                  <li><a href="/documentation/serving_data/index.html">Fuseki</a></li>
+                  <li><a href="/documentation/assembler/index.html">Assembler</a></li>
+                  <li><a href="/documentation/ontology/">Ontology API</a></li>
+                  <li><a href="/documentation/inference/index.html">Inference API</a></li>
+                  <li><a href="/documentation/tools/index.html">Command-line tools</a></li>
+                  <li><a href="/documentation/extras/index.html">Extras</a></li>
+                </ul>
+              </li>
+
+              <li class="drop down">
+                <a href="#" class="dropdown-toggle" data-toggle="dropdown"><span class="glyphicon glyphicon-book"></span> Javadoc <b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/documentation/javadoc/jena/">Jena Core</a></li>
+                  <li><a href="/documentation/javadoc/arq/">ARQ</a></li>
+                  <li><a href="/documentation/javadoc/tdb/">TDB</a></li>
+                  <li><a href="/documentation/javadoc/text/">Text Search</a></li>
+                  <li><a href="/documentation/javadoc/spatial/">Spatial Search</a></li>
+                  <li><a href="/documentation/javadoc/security/">Security</a></li>
+                  <li><a href="/documentation/javadoc/jdbc/">JDBC</a></li>
+                  <li><a href="/documentation/javadoc/fuseki/">Fuseki</a></li>
+                </ul>
+              </li>
+
+              <li id="ask"><a href="/help_and_support/index.html"><span class="glyphicon glyphicon-question-sign"></span> Ask</a></li>
+              
+              <li class="dropdown">
+                <a href="#" class="dropdown-toggle" data-toggle="dropdown"><span class="glyphicon glyphicon-bullhorn"></span> Get involved <b class="caret"></b></a>
+                <ul class="dropdown-menu">
+                  <li><a href="/getting_involved/index.html">Contribute</a></li>
+                  <li><a href="/help_and_support/bugs_and_suggestions.html">Report a bug</a></li>
+                  <li class="divider"></li>
+                  <li class="dropdown-header">Project</li>
+                  <li><a href="/about_jena/about.html">About Jena</a></li>
+                  <li><a href="/about_jena/roadmap.html">Roadmap</a></li>
+                  <li><a href="/about_jena/architecture.html">Architecture</a></li>
+                  <li><a href="/about_jena/team.html">Project team</a></li>
+                  <li><a href="/about_jena/contributions.html">Related projects</a></li>
+                  <li class="divider"></li>
+                  <li class="dropdown-header">ASF</li>
+                  <li><a href="http://www.apache.org/">Apache Software Foundation</a></li>
+                  <li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+                  <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+                  <li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
+                  <li><a href="http://www.apache.org/security/">Security</a></li>
+                </ul>
+              </li>
+
+              <li id="edit"><a href="javascript:improveThisPage(location.href);" title="Improve this Page (Use username anonymous and empty password)"><span class="glyphicon glyphicon-pencil"></span> Improve this Page</a></li>   
+    </ul>
+  </div>
+</div>
+</nav>
+
+
+<div class="container">
+	<div class="row">
+	<div class="col-md-12">
+	<div id="breadcrumbs"></div>
+	<h1 class="title">Apache Jena Elephas - Map/Reduce API</h1>
+  <p>The Map/Reduce API provides a range of building block <code>Mapper</code> and <code>Reducer</code> implementations that can be used as a starting point for building Map/Reduce applications that process RDF.  Typically more complex applications will need to implement their own variants but these basic ones may still prove useful as part of a larger pipeline.</p>
+<div class="toc">
+<ul>
+<li><a href="#tasks">Tasks</a><ul>
+<li><a href="#counting">Counting</a><ul>
+<li><a href="#node-usage">Node Usage</a></li>
+<li><a href="#literal-data-types">Literal Data Types</a></li>
+<li><a href="#namespaces">Namespaces</a></li>
+</ul>
+</li>
+<li><a href="#filtering">Filtering</a><ul>
+<li><a href="#valid-data">Valid Data</a></li>
+<li><a href="#ground-data">Ground Data</a></li>
+<li><a href="#data-with-a-specific-uri">Data with a specific URI</a></li>
+</ul>
+</li>
+<li><a href="#grouping">Grouping</a></li>
+<li><a href="#splitting">Splitting</a></li>
+<li><a href="#transforming">Transforming</a></li>
+</ul>
+</li>
+</ul>
+</div>
+<h1 id="tasks">Tasks</h1>
+<p>The API is divided based upon implementations that support various common Hadoop tasks with appropriate <code>Mapper</code> and <code>Reducer</code> implementations provided for each.  In most cases these are implemented to be at least partially abstract to make it easy to implement customised versions of these.</p>
+<p>The following common tasks are supported:</p>
+<ul>
+<li>Counting</li>
+<li>Filtering</li>
+<li>Grouping</li>
+<li>Splitting</li>
+<li>Transforming</li>
+</ul>
+<h2 id="counting">Counting</h2>
+<p>Counting is one of the classic Map/Reduce tasks and features as both the official Map/Reduce example for both Hadoop itself and for Elephas.  Implementations cover a number of different counting tasks that you might want to carry out upon RDF data, in most cases you will use the desired <code>Mapper</code> implementation in conjunction with the <code>NodeCountReducer</code>.</p>
+<h3 id="node-usage">Node Usage</h3>
+<p>The simplest type of counting supported is to count the usages of individual RDF nodes within the triples/quads.  Depending on whether your data is triples/quads you can use either the <code>TripleNodeCountMapper</code> or the <code>QuadNodeCountMapper</code>.</p>
+<p>If you want to count only usages of RDF nodes in a specific position then we also provide variants for that, for example <code>TripleSubjectCountMapper</code> counts only RDF nodes present in the subject position.  You can substitute <code>Predicate</code> or <code>Object</code> into the class name in place of <code>Subject</code> if you prefer to count just RDF nodes in the predicate/object position instead.  Similarly replace <code>Triple</code> with <code>Quad</code> if you wish to count usage of RDF nodes in specific positions of quads, an additional <code>QuadGraphCountMapper</code> if you want to calculate the size of graphs.</p>
+<h3 id="literal-data-types">Literal Data Types</h3>
+<p>Another interesting variant of counting is to count the usage of literal data types, you can use the <code>TripleDataTypeCountMapper</code> or <code>QuadDataTypeCountMapper</code> if you want to do this.</p>
+<h3 id="namespaces">Namespaces</h3>
+<p>Finally you may be interested in the usage of namespaces within your data, in this case the <code>TripleNamespaceCountMapper</code> or <code>QuadNamespaceCountMapper</code> can be used to do this.  For this use case you should use the <code>TextCountReducer</code> to total up the counts for each namespace.  Note that the mappers determine the namespace for a URI simply by splitting after the last <code>#</code> or <code>/</code> in the URI, if no such character exists then the full URI is considered to be the namespace.</p>
+<h2 id="filtering">Filtering</h2>
+<p>Filtering is another classic Map/Reduce use case, here you want to take the data and extract only the portions that you are interested in based on some criteria.  All our filter <code>Mapper</code> implementations also support a Job configuration option named <code>rdf.mapreduce.filter.invert</code> allowing their effects to be inverted if desired.</p>
+<h3 id="valid-data">Valid Data</h3>
+<p>One type of filter that may be useful particularly if you are generating RDF data that may not be strict RDF is the <code>ValidTripleFilterMapper</code> and the <code>ValidQuadFilterMapper</code>.  These filters only keep triples/quads that are valid according to strict RDF semantics i.e.</p>
+<ul>
+<li>Subject can only be URI/Blank Node</li>
+<li>Predicate can only be a URI</li>
+<li>Object can be a URI/Blank Node/Literal</li>
+<li>Graph can only be a URI or Blank Node</li>
+</ul>
+<p>If you wanted to extract only the bad data e.g. for debugging then you can of course invert these filters by setting <code>rdf.mapreduce.filter.invert</code> to <code>true</code>.</p>
+<h3 id="ground-data">Ground Data</h3>
+<p>In some cases you may only be interesting in triples/quads that are grounded i.e. don't contain blank nodes in which case the <code>GroundTripleFilterMapper</code> and <code>GroundQuadFilterMapper</code> can be used.</p>
+<h3 id="data-with-a-specific-uri">Data with a specific URI</h3>
+<p>In lots of case you may want to extract only data where a specific URI occurs in a specific position, for example if you wanted to extract all the <code>rdf:type</code> declarations then you might want to use the <code>TripleFilterByPredicateUriMapper</code> or <code>QuadFilterByPredicateUriMapper</code> as appropriate.  The job configuration option <code>rdf.mapreduce.filter.predicate.uris</code> is used to provide a comma separated list of the full URIs you want the filter to accept.</p>
+<p>Similar to the counting of node usage you can substitute <code>Predicate</code> for <code>Subject</code>, <code>Object</code> or <code>Graph</code> as desired.  You will also need to do this in the job configuration option, for example to filter on subject URIs in quads use the <code>QuadFilterBySubjectUriMapper</code> and the <code>rdf.mapreduce.filter.subject.uris</code> configuration option.</p>
+<h2 id="grouping">Grouping</h2>
+<p>Grouping is again another frequent Map/Reduce use case, here we provide implementations that allow you to group triples or quads by a specific RDF node within the triples/quads e.g. by subject.  For example to group quads by predicate use the <code>QuadGroupByPredicateMapper</code>, similar to filtering and counting you can substitute <code>Predicate</code> for <code>Subject</code>, <code>Object</code> or <code>Graph</code> if you wish to group by another node of the triple/quad.</p>
+<h2 id="splitting">Splitting</h2>
+<p>Splitting allows you to split triples/quads up into the constituent RDF nodes, we provide two kinds of splitting:</p>
+<ul>
+<li>To Nodes - Splits pairs of arbitrary keys with triple/quad values into several pairs of the key with the nodes as the values</li>
+<li>With Nodes - Splits pairs of arbitrary keys with triple/quad values keeping the triple/quad as the key and the nodes as the values.</li>
+</ul>
+<h2 id="transforming">Transforming</h2>
+<p>Transforming provides some very simple implementations that allow you to convert between triples and quads.  For the lossy case of going from quads to triples simply use the <code>QuadsToTriplesMapper</code>.</p>
+<p>If you want to go the other way - triples to quads - this requires adding a graph field to each triple and we provide two implementations that do that.  Firstly there is <code>TriplesToQuadsBySubjectMapper</code> which puts each triple into a graph based on its subject i.e. all triples with a common subject go into a graph named for the subject.  Secondly there is <code>TriplesToQuadsConstantGraphMapper</code> which simply puts all triples into the default graph, if you wish to change the target graph you should extend this class.  If you wanted to select the graph to use based on some arbitrary criteria you should look at extending the <code>AbstractTriplesToQuadsMapper</code> instead.</p>
+  </div>
+</div>
+
+</div><!--/.container -->
+
+    <footer class="footer">
+      <div class="container">
+        <p>Copyright &copy; 2011&ndash;2014 The Apache Software Foundation, Licensed under
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+        </p>
+        <p>
+        Apache Jena, Jena, the Apache Jena project logo,
+        Apache and the Apache feather logos are trademarks of The Apache Software Foundation.
+        </p>
+      </div>
+  </footer>
+      
+
+</body>
+</html>