You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by bu...@apache.org on 2012/04/11 10:32:42 UTC

svn commit: r812319 - in /websites/staging/stanbol/trunk/content: ./ stanbol/docs/trunk/enhancer/contentitem.html stanbol/docs/trunk/enhancer/contentitemfactory.html

Author: buildbot
Date: Wed Apr 11 08:32:42 2012
New Revision: 812319

Log:
Staging update by buildbot for stanbol

Added:
    websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/contentitemfactory.html
Modified:
    websites/staging/stanbol/trunk/content/   (props changed)
    websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/contentitem.html

Propchange: websites/staging/stanbol/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Wed Apr 11 08:32:42 2012
@@ -1 +1 @@
-1324628
+1324631

Modified: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/contentitem.html
==============================================================================
--- websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/contentitem.html (original)
+++ websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/contentitem.html Wed Apr 11 08:32:42 2012
@@ -46,7 +46,7 @@
 <ul>
 <li><a href="/stanbol/docs/trunk/downloads.html">Overview</a></li>
 </ul>
-<h1 id="the_asf">The ASF</h1>
+<h1 id="the-asf">The ASF</h1>
 <ul>
 <li><a href="http://www.apache.org">Apache Software Foundation</a></li>
 <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
@@ -58,9 +58,9 @@
   <div id="content">
     <h1 class="title">Content Item</h1>
     <p><span style="float:right"> <img alt="Content Item Overview" src="contentitemoverview.png" title="The ContentItem can contain several ContentParts and the Enhancement Metadata - an RDF Graph" /></span> </p>
-<p>The ContentItem is the object which represents the content to be enhanced by Apache Stanbol. It is created based on the data provided by the enhancement request and used throughout the enhancement process to store results. Therefore, after the enhancement process has finished, the ContentItem represents the result of the Apache Stanbol enhancement process.</p>
+<p>The ContentItem is the object which represents the content to be enhanced by Apache Stanbol. It is created based on the data provided by the enhancement request and used throughout the enhancement process to store results. Therefore, after the enhancement process has finished, the ContentItem represents the result of the Apache Stanbol enhancement process. ContentItem instances are created by using the <a href="contentitemfactory.html">ContentItemFactory</a> service.</p>
 <p>The following section describes the interface of the ContentItem in detail:</p>
-<h3 id="content_parts">Content Parts</h3>
+<h3 id="content-parts">Content Parts</h3>
 <p>Content parts are used to represent the original content as well as transformations of the original content (typically created by pre-processing <a href="engines/list.html">enhancement engines</a> such as the <a href="engines/metaxaengine.html">Metaxa engine</a>). </p>
 <p>The ContentItem provides the following API to work with content parts:</p>
 <div class="codehilite"><pre><span class="cm">/** Getter for the ContentPart based on the index */</span>
@@ -80,7 +80,7 @@
 <li>Content parts which have additional metadata provided within the metadata of the content item. Such content parts are typically used to store transformed versions of the original content. This allows e.g. engines which can only process plain text versions to query for the content part containing this version of the parsed document.</li>
 <li>Content parts that are registered under a predefined URI. Such content parts are typically not mentioned within the metadata of the content item. This is used to share intermediate enhancement results between enhancement engines. An example would be tokens, sentences, POS tags and chunks that are extracted by some NLP engine. Engines which want to consume such data need to know the predefined URI of the content part holding this data. They will check within the <code>canEnhance(..)</code> method if a content part with an expected URI is present and if it has the correct type. </li>
 </ol>
-<h3 id="accessing_the_main_content_of_the_contentitem">Accessing the main content of the ContentItem</h3>
+<h3 id="accessing-the-main-content-of-the-contentitem">Accessing the main content of the ContentItem</h3>
 <p>The main content of the ContentItem refers to the content parsed by the enhancement request (or downloaded from the URL provided by a request). For accessing this content the following methods are available</p>
 <div class="codehilite"><pre><span class="cm">/** Getter for the InputStream of the content as parsed</span>
 <span class="cm">    for the ContentItem */</span>
@@ -101,10 +101,10 @@
 
 
 <p>returns the same blob instance.</p>
-<h3 id="metadata_of_the_contentitem">Metadata of the ContentItem</h3>
+<h3 id="metadata-of-the-contentitem">Metadata of the ContentItem</h3>
 <p>The metadata of the ContentItem is managed by a lockable MGraph. This is basically a normal <code>java.util.Collections</code> for triples. The only RDF specific method is the support for filtered iterators which support wildcards for subjects, predicates and objects.</p>
 <p>This graph is used to store all enhancement results as well as metadata about the content item (such as content parts) and the enhancement process (see <a href="executionmetadata.html">execution metadata</a>).</p>
-<h3 id="readwrite_locks">Read/Write locks</h3>
+<h3 id="readwrite-locks">Read/Write locks</h3>
 <p>During the Apache Stanbol enhancement process as executed by the <a href="enhancementjobmanager.html">enhancement job manager</a> components running in multiple threads need to access the state of the <em>ContentItem</em>. Because of that the content item provides the possibility to acquire locks.</p>
 <div class="codehilite"><pre><span class="cm">/** Getter for the ReadWirteLock of a ContentItem */</span>
 <span class="o">+</span> <span class="n">getLock</span><span class="o">()</span> <span class="o">:</span> <span class="n">java</span><span class="o">.</span><span class="na">util</span><span class="o">.</span><span class="na">concurrent</span><span class="o">.</span><span class="na">ReadWriteLock</span>
@@ -135,7 +135,16 @@
 
 
 <p>While accessing content items within an <a href="engines">enhancement engine</a> there is an exception to this rule. If an engine declares that it only supports the <code>SYNCHRONOUS</code> enhancement mode, then the <a href="enhancementjobmanager.html">enhancement job manager</a> needs to take care that an engine has exclusive access to the <em>CotentItem</em>. In this case implementors of enhancement engines need not to care about using read/write locks.</p>
-<h3 id="multipart_mime_serialization">Multipart MIME serialization</h3>
+<h3 id="contentitemfactory">ContentItemFactory</h3>
+<p>Since version 0.10.0 ContentItems and Blobs are created by using the <a href="contentitemfactory.html">ContentItemFactory</a>. ContentItemFactory implementation register themselves as OSGI service. By default the implementation with the highest "service.ranking" is used by the StanbolEnhancer to create instances. By default two implementations are available. The in-memory and a file-based one where the in-memory implementation is used as default.</p>
+<p>Most users will not need to change the default ContentItem implementation. However if the Enhancer is used to extract metadata from gib media files such as EXIF metadata from big images, ID3 from MP3 files ... than changing the default from the InMemoryContentItemFactory to the FileContentItemFactory might considerable reduce the memory footprint. </p>
+<p>With the introduction of the ContentItemFactory also all ContentItem implementation specific constructors to parse content where deprecated and replaced by the following three interfaces:</p>
+<ol>
+<li><strong>ContentSource</strong> allows to parse Content that is available as stream, byte array or string.</li>
+<li><strong>ContentReference</strong> allows to parse a Reference (e.g. a URL) to a ContentItem. The derefernce() method of this interface is used by the ContentItemFactory to convert a ContentReference to a ContentSource.</li>
+<li><strong>ContentSink</strong> allows to obtain an OutputStream to an initially empty Blob that can later be used to stream the content. This is intended to be used by EnhancementEngine that need to convert content from one format to an other because it allows to avoid caching the converted content in-memory.</li>
+</ol>
+<h3 id="multipart-mime-serialization">Multipart MIME serialization</h3>
 <p><span style="float:right"> <img alt="ContentItem Multipart MIME format" src="contentitemmultipartmime.png" title="This figure provides an overview how Content Items are serialized as MultiPart MIME" /></span></p>
 <p>Stanbol supports the serialization of content items as multipart MIME. This serialization is used by the RESTful API of the Stanbol Enhancer. This section provides details about how content items are represented using multipart MIME. For more information on how to send/receive multipart content items via the RESTful Services provided by the Stanbol Enhancer please see the documentation provided in the web interface (e.g. at http://localhost:8080/enhancer).</p>
 <p>The following figure provides an overview on how ContentItems are represented using MultiPart MIME.</p>

Added: websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/contentitemfactory.html
==============================================================================
--- websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/contentitemfactory.html (added)
+++ websites/staging/stanbol/trunk/content/stanbol/docs/trunk/enhancer/contentitemfactory.html Wed Apr 11 08:32:42 2012
@@ -0,0 +1,180 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE- 2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+  <link href="/stanbol/css/stanbol.css" rel="stylesheet" type="text/css">
+  <title>Apache Stanbol - Content Item Factory</title>
+  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+  <link rel="icon" type="image/png" href="/stanbol/images/stanbol-logo/stanbol-favicon.png"/>
+</head>
+
+<body>
+  <div id="navigation"> 
+  <a href="/stanbol/index.html"><img alt="Apache Stanbol" width="220" height="101" border="0" src="/stanbol/images/stanbol-logo/stanbol-2010-12-14.png"/></a>
+  <h1 id="stanbol">Stanbol</h1>
+<ul>
+<li><a href="/stanbol/index.html">Home</a></li>
+<li><a href="/stanbol/docs/trunk/tutorial.html">Tutorial</a></li>
+<li><a href="/stanbol/docs/trunk/">Documentation</a></li>
+<li><a href="/stanbol/docs/trunk/building.html">Building</a></li>
+</ul>
+<h1 id="project">Project</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/mailinglists.html">Mailing Lists</a></li>
+<li><a href="https://issues.apache.org/jira/browse/STANBOL">Issue Tracker</a></li>
+<li><a href="/stanbol/team.html">Project Team</a></li>
+<li><a href="http://www.apache.org/licenses/LICENSE-2.0">License</a></li>
+</ul>
+<h1 id="downloads">Downloads</h1>
+<ul>
+<li><a href="/stanbol/docs/trunk/downloads.html">Overview</a></li>
+</ul>
+<h1 id="the-asf">The ASF</h1>
+<ul>
+<li><a href="http://www.apache.org">Apache Software Foundation</a></li>
+<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+<li><a href="http://www.apache.org/foundation/sponsorship.html">Become a Sponsor</a></li>
+<li><a href="http://www.apache.org/security/">Security</a></li>
+</ul>
+  </div>
+  
+  <div id="content">
+    <h1 class="title">Content Item Factory</h1>
+    <p>The ContentItemFactory is used by the Stanbol Enhancer to create <a href="contentitem.html">ContentItem</a> and Blob instances. ContentItemFactory implementation typically register themselves as OSGI service. The Stanbol Enhancer will use the factory implementation with the highest "service.ranking" to create ContentItems and Blobs for requests on the RESTful API. When using the Java API any ContentItem implementation can be used.</p>
+<h3 id="contentitemfactory-interface">ContentItemFactory interface</h3>
+<p>The interface of the ContentItemFactory defines the following methods to create ContentItems</p>
+<div class="codehilite"><pre><span class="o">+</span> <span class="n">createContentItem</span><span class="o">(</span><span class="n">ContentSource</span> <span class="n">source</span><span class="o">)</span> <span class="o">:</span> <span class="n">ContentItem</span>
+<span class="o">+</span> <span class="n">createContentItem</span><span class="o">(</span><span class="n">String</span> <span class="n">prefix</span><span class="o">,</span> <span class="n">ContentSource</span> <span class="n">source</span><span class="o">)</span> <span class="o">:</span> <span class="n">ContentItem</span>
+<span class="o">+</span> <span class="n">createContentItem</span><span class="o">(</span><span class="n">UriRef</span> <span class="n">id</span><span class="o">,</span> <span class="n">ContentSource</span> <span class="n">source</span><span class="o">)</span> <span class="o">:</span> <span class="n">ContentItem</span>
+<span class="o">+</span> <span class="n">createContentItem</span><span class="o">(</span><span class="n">String</span> <span class="n">prefix</span><span class="o">,</span> <span class="n">ContentSource</span> <span class="n">source</span><span class="o">,</span> <span class="n">MGraph</span> <span class="n">metadata</span><span class="o">)</span> <span class="o">:</span> <span class="n">ContentItem</span>
+<span class="o">+</span> <span class="n">createContentItem</span><span class="o">(</span><span class="n">UriRef</span> <span class="n">id</span><span class="o">,</span> <span class="n">ContentSource</span> <span class="n">source</span><span class="o">,</span> <span class="n">MGraph</span> <span class="n">metadata</span><span class="o">)</span> <span class="o">:</span> <span class="n">ContentItem</span>
+<span class="o">+</span> <span class="n">createContentItem</span><span class="o">(</span><span class="n">ContentReference</span> <span class="n">reference</span><span class="o">)</span> <span class="o">:</span> <span class="n">ContentItem</span>
+<span class="o">+</span> <span class="n">createContentItem</span><span class="o">(</span><span class="n">ContentReference</span> <span class="n">reference</span><span class="o">,</span> <span class="n">MGraph</span> <span class="n">metadata</span><span class="o">)</span> <span class="o">:</span> <span class="n">ContentItem</span>
+</pre></div>
+
+
+<p>The content for created ContentItem can be passed by using either a ContentSource or a ContentReference. The Stanbol Enhancer Servicesapi module provides implementations for creating ContentSources for Java streams, byte arrays and string object as well as ContentReferences for URLs. For details see the sections below.</p>
+<p>The URI of the created ContentItem is determined as follows:</p>
+<ul>
+<li>if no URI is passed, than it is calculated by using a default prefix plus an digest over the passed content. This ensures that of the some content is passed several times the created ContentItems will use the same id.</li>
+<li>methods that take a <strong>prefix</strong> will also generate the URI by calculating a digest over the passed content. However the passed prefix will be used instead of the default one.</li>
+<li>If an <strong>UriRef id</strong> is passed, than that URI is used as id for the content item.</li>
+</ul>
+<p>The ContentItemFactory allows also to parse pre-existing metadata. All RDF triples in the passed MGraph are guaranteed to be added to the metadata of the created ContentItems. Note that implementations are free to directly use the passed MGraph instance for the metadata or to create an new MGraph instance and copy all triples of the passed instance.</p>
+<p>The following methods of the ContentItemFactory can be used to create Blobs</p>
+<div class="codehilite"><pre><span class="o">+</span> <span class="n">createBlob</span><span class="o">(</span><span class="n">ContentSource</span> <span class="n">source</span><span class="o">)</span> <span class="o">:</span> <span class="n">Blob</span>
+<span class="o">+</span> <span class="n">createBlob</span><span class="o">(</span><span class="n">ContentReference</span> <span class="n">reference</span><span class="o">)</span> <span class="o">:</span> <span class="n">Blob</span>
+<span class="o">+</span> <span class="n">createContentSink</span><span class="o">(</span><span class="n">String</span> <span class="n">mediaType</span><span class="o">)</span> <span class="o">:</span> <span class="n">ContentSink</span>
+</pre></div>
+
+
+<p>The Blob interface is used by the Stanbol Enhancer to represent content. Blobs are added to ContentItems as <a href="contentitem.html#content_parts">content parts</a>. In addition to the ContentSource and ContentReference interfaces that are also supported for the creation of ContentItems for the creation of Blobs also a ContentSink can be used. A ContentSink allows to obtain an OutputStream to an initially empty Blob that can later be used to stream the content. This is intended to be used by EnhancementEngine that need to convert content from one format to an other because it allows to avoid caching the converted content in-memory.</p>
+<h3 id="contentitem-implementations">ContentItem implementations</h3>
+<p>By default the Stanbol Enhancer provides two ContentItemFactory/ContentItem/Blob implementations. Users can control the implementation used by the Stanbol Enhancer by configuring the "service.ranking" property of the different ContentItemFactory implementations (e.g. via the configuration tab of the Apache Felix Web Console). The implementation with the highest "service.ranking" will be used by the Stanbol Enhancer to create ContentItems and Blobs. </p>
+<h4 id="in-memory-contentitem">In-memory ContentItem</h4>
+<p>This implementation manages contents - Blobs - as byte arrays that are kept in-memory. While this ensures fast access to the passed content it also might cause problems if the Stanbol Enhancer is used to process big media files. Nonetheless this is currently used as default, because for typical usage scenarios content processed by the Stanbol Enhancer easily fits into memory.</p>
+<p>The ContentItemFactory of this implementation registers itself with a "service.ranking" of 100 and is therefore used as default by the Stanbol Enhancer.</p>
+<h4 id="file-based-contentitem">File-based ContentItem</h4>
+<p>This implementation differs from the in-memory one that it stores content - Blobs - in temporary files on the hard disc. All other information such as the metadata or non Blob content parts are still kept in-memory. This implementation is intended to be used by users that use the Stanbol Enhancer to process big media files such as TIFF images, MP3 files, rich text files including big graphics or even video files. </p>
+<p>The ContentItemFactory of the the file based implementation is registered with a "service.ranking" of 50. To use it as default users need to ensure that the ranking of this implementation higher than the one of the in-memory implementation.</p>
+<h3 id="contentsource">ContentSource</h3>
+<p>This interface describes the source of a content. It defines the following API</p>
+<div class="codehilite"><pre><span class="cm">/** the content as stream */</span>
+<span class="o">+</span> <span class="n">getStream</span><span class="o">()</span> <span class="o">:</span> <span class="n">InputStream</span>
+<span class="cm">/** the content as byte array */</span>
+<span class="o">+</span> <span class="n">getData</span><span class="o">()</span> <span class="o">:</span> <span class="kt">byte</span><span class="o">[]</span>
+<span class="cm">/** optionally the media type of the content */</span>
+<span class="o">+</span> <span class="n">getMediaType</span><span class="o">()</span> <span class="o">:</span> <span class="n">String</span>
+<span class="cm">/** optionally the file name of the content */</span>
+<span class="o">+</span> <span class="n">getFileName</span><span class="o">()</span> <span class="o">:</span> <span class="n">String</span>
+<span class="cm">/** optionally additional headers */</span>
+<span class="o">+</span> <span class="n">getHeaders</span><span class="o">()</span> <span class="o">:</span> <span class="n">Map</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span><span class="n">List</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;&gt;</span>
+</pre></div>
+
+
+<p>The ContentSource interface defines methods for obtaining the wrapped content as InputStream and byte[]. This is mainly to avoid unnecessary copying of content. Implementors of ContentItems SHOULD prefer to call </p>
+<ul>
+<li>ContentSource#getData() if the ContentItem/Blob implementation will store the content as byte[] in-memory</li>
+<li>ContentSource#getStream() if the content of a ContentSource is streamed to a file, database, CMS or any other target outside the JVM.</li>
+</ul>
+<p>The following implementations of this interface are provided by the Stanbol Enhnacer servicesapi module</p>
+<ul>
+<li>StreamSource: A ContentSource wrapping an InputStream. Multiple calls to #getStream() are not be supported and will cause IllegalStateExceptions. Calls to #getData() will load the contents of the stream to an in memory.</li>
+<li>ByteArraySource: A ContentSource implementation that uses a byte array to store represent the content. All constructors take the byte array representing the content as parameter. Calls to #getData() MUST NOT copy the byte array to avoid duplications.</li>
+<li>StringSource: A ContentSource implementation that directly allows to parse a String instance. The constructors convert the passed String to an byte array by using the passed Charset. UTF-8 is used as default. This implementation is based on the ByteArraySource.</li>
+</ul>
+<h3 id="contentreference">ContentReference</h3>
+<p>This interface allows to describe content that is not yet locally available. The Stanbol Enhancer will dereference the content when automatically when needed.</p>
+<div class="codehilite"><pre><span class="cm">/** the Reference to the content */</span>
+<span class="o">+</span> <span class="n">gerReference</span><span class="o">()</span> <span class="o">:</span> <span class="n">String</span>
+<span class="cm">/** dereferences the content */</span>
+<span class="o">+</span> <span class="n">dereference</span><span class="o">()</span> <span class="o">:</span> <span class="n">ContentSource</span>
+</pre></div>
+
+
+<p>When referenced content is dereferenced by the Stanbol Enhancer depends on many factors. Earliest it may be dereferenced by the createBlob/createContentItem methods of a ContentItemFactory implementation. At latest it will be dereferenced when the referenced content is first used by the Stanbol Enhancer (e.g. on a call to ContentItem#getStream() or ContentItem#getMimeType()).</p>
+<p>By default an ContentReference implementation for Java URLs is provided by the Stanbol Enhancer servicesapi module. This implementation replaces the WebContentItem that was used for obtaining content from URL until Stanbol version 0.9.0-incubating. </p>
+<h3 id="contentsink">ContentSink</h3>
+<p>EnhancementEngines that do convert passed content (e.g. the <a href="engines/tikaengine.html">TikaEngine</a>) are often capable to so stream processing on content - meaning that the do not need to load the whole content in memory while analyzing it. To support this operation mode also within the StanbolEnhancer the ContentSink interface place an important role as it allows to create an - initially empty - Blob and than "stream" the content to it while processing the content.</p>
+<p>The following method of the ContentItemFactory can be used to create a ContentSink</p>
+<div class="codehilite"><pre><span class="cm">/** Creates a new ContentSink */</span>
+<span class="o">+</span> <span class="n">createContentSink</span><span class="o">(</span><span class="n">String</span> <span class="n">mediaType</span><span class="o">)</span> <span class="o">:</span> <span class="n">ContentSink</span><span class="o">;</span>
+</pre></div>
+
+
+<p>The ContentSink interface provides the OutputStream as well as the created Blob</p>
+<div class="codehilite"><pre><span class="cm">/** Getter for the OutputStream */</span>
+<span class="o">+</span> <span class="n">getOutputStream</span><span class="o">()</span> <span class="o">:</span> <span class="n">OutputStream</span><span class="o">;</span>
+<span class="cm">/** Getter for the Blob */</span>
+<span class="o">+</span> <span class="n">getBlob</span><span class="o">()</span> <span class="o">:</span> <span class="n">Blob</span><span class="o">;</span>
+</pre></div>
+
+
+<p><strong>Note:</strong> User MUST NOT parse the Blob of a ContentSink to any other components until all the data are written to the OutputStream, because this may cause that other components to read partial data when calling Blob#getStream(). This feature is intended to reduce the memory footprint and not to support concurrent writing and reading of data as supported by pipes.</p>
+<h4 id="intended-usage">Intended Usage:</h4>
+<p>This example shows a typical usage of a ContentSink within the processEnhancement(..) method of an EnhancementEngine that needs to transform some content.</p>
+<div class="codehilite"><pre><span class="n">ContentItem</span> <span class="n">ci</span><span class="o">;</span> <span class="c1">//the content item to process</span>
+<span class="n">ContentSink</span> <span class="n">plainTextSink</span> <span class="o">=</span> <span class="n">contentItemFactory</span><span class="o">.</span><span class="na">createContentSink</span><span class="o">(</span><span class="s">&quot;text/plain&quot;</span><span class="o">);</span>
+<span class="n">Writer</span> <span class="n">writer</span> <span class="o">=</span> <span class="k">new</span> <span class="n">OutputStreamWriter</span><span class="o">(</span><span class="n">plainTextSink</span><span class="o">.</span><span class="na">getOutputStream</span><span class="o">,</span><span class="s">&quot;UTF-8&quot;</span><span class="o">);</span>
+<span class="k">try</span> <span class="o">{</span>
+<span class="c1">// parse the writer to the framework that extracts the text</span>
+<span class="o">}</span> <span class="k">finally</span> <span class="o">{</span>
+    <span class="n">IOUtils</span><span class="o">.</span><span class="na">closeQuietly</span><span class="o">(</span><span class="n">writer</span><span class="o">);</span>
+<span class="o">}</span>
+<span class="c1">//now add the Blob to the ContentItem</span>
+<span class="n">UriRef</span> <span class="n">textBlobUri</span><span class="o">;</span> <span class="c1">//create an UriRef for the Blob</span>
+<span class="n">ci</span><span class="o">.</span><span class="na">addPart</span><span class="o">(</span><span class="n">textBlobUri</span><span class="o">,</span> <span class="n">plainTextSink</span><span class="o">.</span><span class="na">getBlob</span><span class="o">());</span>
+<span class="n">plainTextSink</span> <span class="o">=</span> <span class="kc">null</span><span class="o">;</span>
+</pre></div>
+  </div>
+  
+  <div id="footer">
+    <div class="copyright">
+      <p>
+        Copyright &copy; 2010 The Apache Software Foundation, Licensed under 
+        the <a href="http://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.
+        <br />
+        Apache, Stanbol and the Apache feather and Stanbol logos are trademarks of The Apache Software Foundation.
+      </p>
+    </div>
+  </div>
+  
+</body>
+</html>