You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by "Rupert Westenthaler (Resolved) (JIRA)" <ji...@apache.org> on 2012/04/11 11:52:24 UTC

[jira] [Resolved] (STANBOL-579) Allow streaming of transformed content to Blobs

     [ https://issues.apache.org/jira/browse/STANBOL-579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rupert Westenthaler resolved STANBOL-579.
-----------------------------------------

    Resolution: Fixed

implemented and documented with #1324645
                
> Allow streaming of transformed content to Blobs
> -----------------------------------------------
>
>                 Key: STANBOL-579
>                 URL: https://issues.apache.org/jira/browse/STANBOL-579
>             Project: Stanbol
>          Issue Type: Sub-task
>          Components: Enhancer
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>
> While adapting the TikaEngine and the MetaxaEngine to the new model ContentItemFactory pattern, i recognized that it is important to support streaming of content to a Blob. Because otherwise those kind of engine would need to temporary hold the whole transformed version of the content (e.g. the extract plain/text, xhtml, ...) before they could create a new Blob via one of the ContentItemFactory#createBlob(...) methods.
> The following extension to the ContentItemFactory will avoid this issue and allow to "stream" content to a Blob
> Added Method to the ContentItemFactory
>     /** Creates a new ContentSink */
>     + createContentSink(String mediaType) : ContentSink;
> and the new Interface ContentSink
>     /** Getter for the OutputStream */
>     + getOutputStream() : OutputStream;
>     /** Getter for the Blob */
>     + getBlob() : Blob;
> __Note:__ User MUST NOT parse the Blob of a ContentSink to any other components until all the data are written to the OutputStream, because this may cause that other components to read partial data when calling Blob#getStream(). This feature is intended to reduce the memory footprint and not to support concurrent writing and reading of data as supported by pipes.
> __Intended Usage:__
> This example shows a typical usage of a ContentSink within the processEnhancement(..) method of an EnhancementEngine
>     ContentItem ci; //the content item to process
>     ContentSink plainTextSink = contentItemFactory.createContentSink("text/plain");
>     Writer writer = new OutputStreamWriter(plainTextSink.getOutputStream,"UTF-8");
>     try {
>     // parse the writer to the framework that extracts the text
>     } finally {
>         IOUtils.closeQuietly(writer); 
>     }
>     //now add the Blob to the ContentItem
>     UriRef textBlobUri; //create an UriRef for the Blob
>     ci.addPart(textBlobUri, plainTextSink.getBlob());
>     plainTextSink = null;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira