You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by "Suat Gonul (JIRA)" <ji...@apache.org> on 2012/06/01 17:54:23 UTC

[jira] [Updated] (STANBOL-498) Contenthub: Enhanced ContentItem Store

     [ https://issues.apache.org/jira/browse/STANBOL-498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suat Gonul updated STANBOL-498:
-------------------------------

    Description: 
Simple Storage interface for enhanced ContentItems.

This Store is used to

1. save the ContentItems after they are enhanced by the Enahncer
    * The Blobs (original content and transcoded versions)
    * The Metadata (Enhancement Results)
2. retrieve ContentItems while semantic indexing
    * Iterator over the IDs
    * Get ContentItem by ID

This store is NOT intended to be used for search! It is only used for ID based lookup.


Implementations:
-----------------------

 * CMS Adapter: An implementation based on the CMS Adapter provides the possibility to store the Enhancement Results directly within the CMS. Typically this will be the CMS also sending the request to the Contenthub, but this is no requirement.
 * Clerezza based implementation: Clerezza - as RDF based CMS - provides the required functionality to store both the content AND the metadata of the contentItem
* File based: Simple file based storage without any external dependencies. This could be used as default and for testing

Interface:
-------------

The interface will be based-on/replace the [Store](http://svn.apache.org/repos/asf/incubator/stanbol/trunk/contenthub/servicesapi/src/main/java/org/apache/stanbol/contenthub/servicesapi/store/Store.java) interface already present in the Contenthub. However the suggestion is to remove the "getEnhancementGraph()" as this is not required by the usecases (1) and (2) mentioned above. In addition the store interface should be extended with a remove method to allow manual deletion of ContentItems.

    /** stores the parsed ContentItem */
    + put(ContentItem ci) : UriRef
    /** Getter for the ContentItem with the parsed ID */
    + get(UriRef id) : ContentItem

### Revisions

Revisions are used to re-synchronize semantic indexes with the enhanced ContentItems managed by this store. Every time the ContentHub indexes enhanced ContentItem - as managed by this store - to a SemanticIndex it provides the highest revision. SemanticIndexes MUST persist such revisions and MUST ensure they are even available after a re-start because this number will be later used by the ContentHub to apply changes to enhances ContentItmes.

In detail a revision is defined as a change (add, update, removal) to one or more ContentItems managed by the Store. Every such change MUST BE result in an increase of the revision. Revisions MUST only use positive numbers. Implementers might use <code>System.currentTimeMillis()</code> as revision but this is no requirement.

The store interface provides a method that returns an Iterator over all changed ContentItems that where changed (added, updated, removed) since a given revision. 

    /** Iterator over all contentItems added/removed after revision */
    + changes(long revision, int offset, int batchSize) : ChangeSet

    class ChangeSet {
        /** the lowest included revision */
        + from() : long
        /** the id of changed ContentItems */
        + changed() : Map<UriRef>
        /** the highest included revision */
        + to() : long
    }


Calls to chages(..) MUST return only changes with a higher revision as the provided number. ChangeSet with the parsed revision number MUST BE excluded. Note that ChangeSet does not provide information about the type of the change. This will be only available after a call to Store#get(..).

The revisions MUST NOT to keep a history of changes. Only the revision of the latest change MUST be kept. This ensures that rebuilding a semantic index (from revsion -1) does only perform indexing steps corresponding to historical state of the Store. Note also that the revisions do not provide information about the type of the change. If a ContentItem is still present (added, updated) or was removed will be indicated by the get(..) method of the store returning a ContentItem instance or <code>null</code>

#### Example:

e.g. if first the contentItem 1,2 and 3 are added, later content item 2 is updated and 3 is deleted and in a third step contentitem 3 and 4 are added this would result in the following revision data

After step 1: 

    :::text
    1 : urn:contentItem.1 //added
    1 : urn:contentItem.2 //added
    1 : urn:contentItem.3 //added

After step 2: 

    :::text
    1 : urn:contentItem.1 //added
    2 : urn:contentItem.2 //updated
    2 : urn:contentItem.3 //removed

After step 3: 

    :::text
    1 : urn:contentItem.1 //added
    2 : urn:contentItem.2 //updated
    3 : urn:contentItem.3 //added
    3 : urn:contentItem.4 //added



  was:
Simple Storage interface for enhanced ContentItems.

This Store is used to

1. save the ContentItems after they are enhanced by the Enahncer
    * The Blobs (original content and transcoded versions)
    * The Metadata (Enhancement Results)
2. retrieve ContentItems while semantic indexing
    * Iterator over the IDs
    * Get ContentItem by ID

This store is NOT intended to be used for search! It is only used for ID based lookup.


Implementations:
-----------------------

 * CMS Adapter: An implementation based on the CMS Adapter provides the possibility to store the Enhancement Results directly within the CMS. Typically this will be the CMS also sending the request to the Contenthub, but this is no requirement.
 * Clerezza based implementation: Clerezza - as RDF based CMS - provides the required functionality to store both the content AND the metadata of the contentItem
* File based: Simple file based storage without any external dependencies. This could be used as default and for testing

Interface:
-------------

The interface will be based-on/replace the [Store](http://svn.apache.org/repos/asf/incubator/stanbol/trunk/contenthub/servicesapi/src/main/java/org/apache/stanbol/contenthub/servicesapi/store/Store.java) interface already present in the Contenthub. However the suggestion is to remove the "getEnhancementGraph()" as this is not required by the usecases (1) and (2) mentioned above. In addition the store interface should be extended with a remove method to allow manual deletion of ContentItems.

    /** creates a new ContentItem */
    + create(UriRef id, byte[] content, String contentType) : ContentItem
    + create(UriRef id, InputStream in, String contentType) : ContentItem
    /** stores the parsed ContentItem */
    + put(ContentItem ci) : UriRef
    /** Getter for the ContentItem with the parsed ID */
    + get(UriRef id) : ContentItem
    
### Revisions

Revisions are used to re-synchronize semantic indexes with the enhanced ContentItems managed by this store. Every time the ContentHub indexes enhanced ContentItem - as managed by this store - to a SemanticIndex it provides the highest revision. SemanticIndexes MUST persist such revisions and MUST ensure they are even available after a re-start because this number will be later used by the ContentHub to apply changes to enhances ContentItmes.

In detail a revision is defined as a change (add, update, removal) to one or more ContentItems managed by the Store. Every such change MUST BE result in an increase of the revision. Revisions MUST only use positive numbers. Implementers might use <code>System.currentTimeMillis()</code> as revision but this is no requirement.

The store interface provides a method that returns an Iterator over all changed ContentItems that where changed (added, updated, removed) since a given revision. 

    /** Iterator over all contentItems added/removed after revision */
    + changes(long revision, int maxEntries) : Changes

    class Changes {
        /** the lowest included revision */
        + from() : long
        /** the id of changed ContentItems */
        + changed() : Map<UriRef>
        /** the highest included revision */
        + to() : long
    }


Calls to chages(..) MUST return only changes with a higher revision as the provided number. Changes with the parsed revision number MUST BE excluded. Note that Changes does not provide information about the type of the change. This will be only available after a call to Store#get(..).

The revisions MUST NOT to keep a history of changes. Only the revision of the latested change MUST be kept. This ensures that rebuilding a semantic index (from revsion -1) does only perform indexing steps corresponding to historical state of the Store. Note also that the revisions do not provide information about the type of the change. If a ContentItem is still present (added, updated) or was removed will be indicated by the get(..) method of the store returning a ContentItem instance or <code>null</code>

#### Example:

e.g. if first the contentItem 1,2 and 3 are added, later content item 2 is updated and 3 is deleted and in a third step contentitem 3 and 4 are added this would result in the following revision data

After step 1: 

    :::text
    1 : urn:contentItem.1 //added
    1 : urn:contentItem.2 //added
    1 : urn:contentItem.3 //added

After step 2: 

    :::text
    1 : urn:contentItem.1 //added
    2 : urn:contentItem.2 //updated
    2 : urn:contentItem.3 //removed

After step 3: 

    :::text
    1 : urn:contentItem.1 //added
    2 : urn:contentItem.2 //updated
    3 : urn:contentItem.3 //added
    3 : urn:contentItem.4 //added



    
> Contenthub: Enhanced ContentItem Store
> --------------------------------------
>
>                 Key: STANBOL-498
>                 URL: https://issues.apache.org/jira/browse/STANBOL-498
>             Project: Stanbol
>          Issue Type: Sub-task
>          Components: Content Hub
>            Reporter: Rupert Westenthaler
>
> Simple Storage interface for enhanced ContentItems.
> This Store is used to
> 1. save the ContentItems after they are enhanced by the Enahncer
>     * The Blobs (original content and transcoded versions)
>     * The Metadata (Enhancement Results)
> 2. retrieve ContentItems while semantic indexing
>     * Iterator over the IDs
>     * Get ContentItem by ID
> This store is NOT intended to be used for search! It is only used for ID based lookup.
> Implementations:
> -----------------------
>  * CMS Adapter: An implementation based on the CMS Adapter provides the possibility to store the Enhancement Results directly within the CMS. Typically this will be the CMS also sending the request to the Contenthub, but this is no requirement.
>  * Clerezza based implementation: Clerezza - as RDF based CMS - provides the required functionality to store both the content AND the metadata of the contentItem
> * File based: Simple file based storage without any external dependencies. This could be used as default and for testing
> Interface:
> -------------
> The interface will be based-on/replace the [Store](http://svn.apache.org/repos/asf/incubator/stanbol/trunk/contenthub/servicesapi/src/main/java/org/apache/stanbol/contenthub/servicesapi/store/Store.java) interface already present in the Contenthub. However the suggestion is to remove the "getEnhancementGraph()" as this is not required by the usecases (1) and (2) mentioned above. In addition the store interface should be extended with a remove method to allow manual deletion of ContentItems.
>     /** stores the parsed ContentItem */
>     + put(ContentItem ci) : UriRef
>     /** Getter for the ContentItem with the parsed ID */
>     + get(UriRef id) : ContentItem
> ### Revisions
> Revisions are used to re-synchronize semantic indexes with the enhanced ContentItems managed by this store. Every time the ContentHub indexes enhanced ContentItem - as managed by this store - to a SemanticIndex it provides the highest revision. SemanticIndexes MUST persist such revisions and MUST ensure they are even available after a re-start because this number will be later used by the ContentHub to apply changes to enhances ContentItmes.
> In detail a revision is defined as a change (add, update, removal) to one or more ContentItems managed by the Store. Every such change MUST BE result in an increase of the revision. Revisions MUST only use positive numbers. Implementers might use <code>System.currentTimeMillis()</code> as revision but this is no requirement.
> The store interface provides a method that returns an Iterator over all changed ContentItems that where changed (added, updated, removed) since a given revision. 
>     /** Iterator over all contentItems added/removed after revision */
>     + changes(long revision, int offset, int batchSize) : ChangeSet
>     class ChangeSet {
>         /** the lowest included revision */
>         + from() : long
>         /** the id of changed ContentItems */
>         + changed() : Map<UriRef>
>         /** the highest included revision */
>         + to() : long
>     }
> Calls to chages(..) MUST return only changes with a higher revision as the provided number. ChangeSet with the parsed revision number MUST BE excluded. Note that ChangeSet does not provide information about the type of the change. This will be only available after a call to Store#get(..).
> The revisions MUST NOT to keep a history of changes. Only the revision of the latest change MUST be kept. This ensures that rebuilding a semantic index (from revsion -1) does only perform indexing steps corresponding to historical state of the Store. Note also that the revisions do not provide information about the type of the change. If a ContentItem is still present (added, updated) or was removed will be indicated by the get(..) method of the store returning a ContentItem instance or <code>null</code>
> #### Example:
> e.g. if first the contentItem 1,2 and 3 are added, later content item 2 is updated and 3 is deleted and in a third step contentitem 3 and 4 are added this would result in the following revision data
> After step 1: 
>     :::text
>     1 : urn:contentItem.1 //added
>     1 : urn:contentItem.2 //added
>     1 : urn:contentItem.3 //added
> After step 2: 
>     :::text
>     1 : urn:contentItem.1 //added
>     2 : urn:contentItem.2 //updated
>     2 : urn:contentItem.3 //removed
> After step 3: 
>     :::text
>     1 : urn:contentItem.1 //added
>     2 : urn:contentItem.2 //updated
>     3 : urn:contentItem.3 //added
>     3 : urn:contentItem.4 //added

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira