You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by Suat Gonul <su...@gmail.com> on 2012/07/19 16:12:23 UTC

Contenthub 2-layered structure

Hi everyone,

I have just committed the initial implementation of the index part of
the 2-layered structure of Contenthub. So, we have initial
implementations for both Store and Index layers now. Currently, this
work is carried on under the "contenthub-two-layered-structure" branch.
So, to try out this new structure, contenthub module under this branch
should be built.

I would be very glad to hear your feedbacks. Below, you can see the logs
from the commit:

Best,
Suat

Logs:
Initial version of the default implementation of the SemanticIndex
interface which is defined in STANBOL-499.

SemanticIndex is one part of the 2-layered structure of Contenthub. The
other part is the Store which is defined in STANBOL-498.

Default implementation of the SemanticIndex interface
(LDPathSemanticIndex) is based on the LDPath language. A new
LDPathSemanticIndex can be created by providing name, description and
LDPath values. In the scope of LDPathSemanticIndex the provided LDPath
program is used in two ways which will be explained later in this log.

Each instance of this implementation checks the changes in the Store at
regular intervals in a separate thread and the interval length is
configurable. After processing the changes in the Store, the last
revision is stored persistently. In this way, when the index is
restarted it will check the the changes as of the latest persisted
revision. However, when the LDPath is changed the LDPathSemanticIndex
will index the ContentItems from scratch. In this period the index will
be REINDEXING state, and during this period, it does not allow other
index or remove operations. After reindexing is completed, the state of
the index will be ACTIVE.

LDPath usages in LDPathSemanticIndex
====================================
a) It is used to configure the underlying Solr core. With an LDPath the
index fields are determined and Solr specific properties such as
"multiValued", "termVectors" can be configured.

b) When indexing of a ContentItem is in progress, each named entity
contained in the enhancements of the ContentItem will be queried through
the Entityhub. Then, the values obtained from Entityhub will be indexed
along with the actual content as additional metadata. And the additional
metadata will be completely compatible with the underlying Solr core.

This ability to create customized indexes allows compatibility with
different domains or use-cases.

Creating,Retrieving LDPathSemanticIndex instances
=================================================
{stanbol_host}/index endpoint can be used to retrieve already registered
SemanticIndexes. An LDPathSemantic index can be created through the
RESTful service i.e {stanbol_host}/index/ldpath or through the Felix Web
Console by configuring a "Apache Stanbol Contenthub LDPath Based
Semantic Index".

Each instance of LDPathSemanticIndex is registered as an OSGi component.
So, they can be obtained through ServiceTracker/@Reference.
Name(Semantic-Index-Name) and description(Semantic-Index-Name)
properties can be used to retrieve specific instances of
LDPathSemanticIndex from OSGi environment. Also, the
SemanticIndexManager service, provides retrieval of indexes according to
their names and EndpointTypes.

Search over the LDPathSemanticIndex
===================================
The previous search functionality of the Contenthub has not changed.
They are wrapped under two types of endpoints: 1) RESTful endpoints 2)
OSGi based Java endpoints. There are two RESTful endpoints which are
SOLR and CONTENTHUB. SOLR endpoint can be used to query the actual
underlying Solr core. CONTENTHUB endpoint offers a search option of
which results contain additional information in addition to the
resultant documents. Those additional information are facets regarding
the resultant documents and related keywords about the original query
term. This endpoint is more experimental one which is open to changes.

Re: Contenthub 2-layered structure

Posted by Rupert Westenthaler <ru...@gmail.com>.
HI all,

here is an update about this.

I had today an first look at the branch [1] especially with the
intension to validate the possible usage of this functionality also
with the Entityhub. For that reason I changed the Store and
SemanticIndex do use generics. This allows to implement them not only
for ContentItem (as needed by the Contenthub) but also Representation
(as used by the Entityhub).

The result looks promising and so Suat and myself discussed to split
up the generic interfaces/implementation with the Contenthub specific
one (see STANBOL-701). The commonly shared interfaces will be reside
under "commons.semanticindex" and include:

* Store interface (maybe feature reduces to be read-only and renamed
to IndexingSource)
* StoreManager: Managing interface that allows to lookup multiple
Store (or IndexingSource) instances.
* SemanticIndex and SemanticIndexManager (STANBOL-499)
* I would like to have a possibility so that a Store (or
IndexingSource) can notify others (mainly SemanticIndex) about changed
Entities. Currently the need to be ask periodically about changes (see
Store#changes() method) - something like an ItemNotifier maybe using
the OSGI Event mechanism. Semantic Indexes could than use the
Store#changes() method to get up-to-date when they are activated and
than use the EntityNotifier functionality to keep in sync while they
are active.

As soon as this is available I will further evaluate how to use this
with the Entityhub.

best
Rupert



[1] http://svn.apache.org/repos/asf/incubator/stanbol/branches/contenthub-two-layered-structure/contenthub/

On Fri, Jul 20, 2012 at 5:24 PM, Rupert Westenthaler
<ru...@gmail.com> wrote:
> Hi Fabian, all
>
> Yes this is still developed within an own branch that we started
> during the Hackathon in Saarbrücken. But you are completely right this
> development was - until now - not visible enough to the community.
> Especially because this design of splitting up
>
> 1st level storage that keeps the data and a
> 2nd level storage that allows to build special indexes of the data
>
> is something that is not only interesting for the Contenthub, but
> might be also adapted by the Entityhub. Especially when we want to
> have the functionality of the Entityhub indexing tool available within
> the Stanbol Environment. This would require to have a storage for the
> Entity data (could be even a remote Service) and a 2nd storage that
> holds the indexed data.
>
> Such a design could us even give more flexibility to build special
> indexes - e.g. adding surface forms as alternate labels, collection
> mentions for MLT queries, following related, broader or other
> relations to build semantic contexts  ... capabilities like that would
> be key for adding things like Entity-Disambiguation to Apache Stanbol.
> Especially if you want to use it with user managed vocabularies -
> without re-indexing the whole vocabulary after changes.
>
> best
> Rupert
>
>
> On Fri, Jul 20, 2012 at 1:11 PM, Fabian Christ
> <ch...@googlemail.com> wrote:
>> Hi,
>>
>> 2012/7/20 Suat Gonul <su...@gmail.com>:
>>> Again let me remind you that, this work is carried on under the
>>> "contenthub-two-layered-structure" branch. Sorry for the bulk update,
>>
>> I am sorry Suat! - I did not recognize that you are still working in
>> your own branch. I was thinking that there was a big change to the
>> trunk without any notification before that. In this case - everything
>> is fine ;)
>>
>> Best,
>>  - Fabian
>>
>> --
>> Fabian
>> http://twitter.com/fctwitt
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Contenthub 2-layered structure

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Fabian, all

Yes this is still developed within an own branch that we started
during the Hackathon in Saarbrücken. But you are completely right this
development was - until now - not visible enough to the community.
Especially because this design of splitting up

1st level storage that keeps the data and a
2nd level storage that allows to build special indexes of the data

is something that is not only interesting for the Contenthub, but
might be also adapted by the Entityhub. Especially when we want to
have the functionality of the Entityhub indexing tool available within
the Stanbol Environment. This would require to have a storage for the
Entity data (could be even a remote Service) and a 2nd storage that
holds the indexed data.

Such a design could us even give more flexibility to build special
indexes - e.g. adding surface forms as alternate labels, collection
mentions for MLT queries, following related, broader or other
relations to build semantic contexts  ... capabilities like that would
be key for adding things like Entity-Disambiguation to Apache Stanbol.
Especially if you want to use it with user managed vocabularies -
without re-indexing the whole vocabulary after changes.

best
Rupert


On Fri, Jul 20, 2012 at 1:11 PM, Fabian Christ
<ch...@googlemail.com> wrote:
> Hi,
>
> 2012/7/20 Suat Gonul <su...@gmail.com>:
>> Again let me remind you that, this work is carried on under the
>> "contenthub-two-layered-structure" branch. Sorry for the bulk update,
>
> I am sorry Suat! - I did not recognize that you are still working in
> your own branch. I was thinking that there was a big change to the
> trunk without any notification before that. In this case - everything
> is fine ;)
>
> Best,
>  - Fabian
>
> --
> Fabian
> http://twitter.com/fctwitt



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Contenthub 2-layered structure

Posted by Fabian Christ <ch...@googlemail.com>.
Hi,

2012/7/20 Suat Gonul <su...@gmail.com>:
> Again let me remind you that, this work is carried on under the
> "contenthub-two-layered-structure" branch. Sorry for the bulk update,

I am sorry Suat! - I did not recognize that you are still working in
your own branch. I was thinking that there was a big change to the
trunk without any notification before that. In this case - everything
is fine ;)

Best,
 - Fabian

-- 
Fabian
http://twitter.com/fctwitt

Re: Contenthub 2-layered structure

Posted by Suat Gonul <su...@gmail.com>.
Hi Fabian,

You are right, there are a bit too changes. Then, I shall remind this
new structure from earlier this year by making a summary of it. From now
on, we can still follow an incremental approach as we are still in
earlier steps.

As introduced in STANBOL-471, Contenthub will be composed of two layers:
Store and Index. The Store layer is intended to keep the Content Items
as they are (not to index them). On the other hand, Index part is
expected to track the changes in the Store layer and update the
underlying index according to index-scoped configurations. A diagram of
this structure can be found in [1].

To get a better idea on the explanations below, I suggest to read
STANBOL-498 and STANBOL-499 first.

File Based Store Implementation (contenthub/store/file)
====================
The initial implementation for the Store layer is a file based one. It
serializes a Content Item into a zip file such that the main part,
enhancement and content parts would be an entry in the zip file.
However, currently only Blob and TripleCollection typed parts are
supported. When a Content Item is stored or deleted its revision number
is updated as the system time. So, given a revision, it is possible to
get the URIs of changed Content Items. The revisions of the content
items are stored in a Derby database.

LDPath Based Index Implementation (contenthub/index)
======================
Already given in the first mail in this thread.

Again let me remind you that, this work is carried on under the
"contenthub-two-layered-structure" branch. Sorry for the bulk update,

Best,
Suat

[1]
https://issues.apache.org/jira/secure/attachment/12512102/contenthub-2layered-storage.jpg

On 07/20/2012 11:01 AM, Fabian Christ wrote:
> Hi Suat,
>
> while I truly appreciate the new developments it would have been nice
> to have some more information on what you guys are doing on this list.
> Maybe I missed something. The community has to keep informed and get a
> chance to follow what is happening. Next time I would suggest to try a
> more incremental approach instead of submitting a big patch with tons
> of changes at once. This is just about the process not about the great
> contributions you did :)
>
> I will also have a closer look next week.
>
> Best,
>  - Fabian
>
> 2012/7/19 Rupert Westenthaler <ru...@gmail.com>:
>> Hi Suat,
>>
>> Great news! I will have a detailed look next week.
>>
>> best
>> Rupert
>>
>> On Thu, Jul 19, 2012 at 4:15 PM, Suat Gonul <su...@gmail.com> wrote:
>>> By the way, STANBOL-471 is the initial issue dedicated to this structure.
>>>
>>>
>>> On 07/19/2012 05:12 PM, Suat Gonul wrote:
>>>> Hi everyone,
>>>>
>>>> I have just committed the initial implementation of the index part of
>>>> the 2-layered structure of Contenthub. So, we have initial
>>>> implementations for both Store and Index layers now. Currently, this
>>>> work is carried on under the "contenthub-two-layered-structure" branch.
>>>> So, to try out this new structure, contenthub module under this branch
>>>> should be built.
>>>>
>>>> I would be very glad to hear your feedbacks. Below, you can see the logs
>>>> from the commit:
>>>>
>>>> Best,
>>>> Suat
>>>>
>>>> Logs:
>>>> Initial version of the default implementation of the SemanticIndex
>>>> interface which is defined in STANBOL-499.
>>>>
>>>> SemanticIndex is one part of the 2-layered structure of Contenthub. The
>>>> other part is the Store which is defined in STANBOL-498.
>>>>
>>>> Default implementation of the SemanticIndex interface
>>>> (LDPathSemanticIndex) is based on the LDPath language. A new
>>>> LDPathSemanticIndex can be created by providing name, description and
>>>> LDPath values. In the scope of LDPathSemanticIndex the provided LDPath
>>>> program is used in two ways which will be explained later in this log.
>>>>
>>>> Each instance of this implementation checks the changes in the Store at
>>>> regular intervals in a separate thread and the interval length is
>>>> configurable. After processing the changes in the Store, the last
>>>> revision is stored persistently. In this way, when the index is
>>>> restarted it will check the the changes as of the latest persisted
>>>> revision. However, when the LDPath is changed the LDPathSemanticIndex
>>>> will index the ContentItems from scratch. In this period the index will
>>>> be REINDEXING state, and during this period, it does not allow other
>>>> index or remove operations. After reindexing is completed, the state of
>>>> the index will be ACTIVE.
>>>>
>>>> LDPath usages in LDPathSemanticIndex
>>>> ====================================
>>>> a) It is used to configure the underlying Solr core. With an LDPath the
>>>> index fields are determined and Solr specific properties such as
>>>> "multiValued", "termVectors" can be configured.
>>>>
>>>> b) When indexing of a ContentItem is in progress, each named entity
>>>> contained in the enhancements of the ContentItem will be queried through
>>>> the Entityhub. Then, the values obtained from Entityhub will be indexed
>>>> along with the actual content as additional metadata. And the additional
>>>> metadata will be completely compatible with the underlying Solr core.
>>>>
>>>> This ability to create customized indexes allows compatibility with
>>>> different domains or use-cases.
>>>>
>>>> Creating,Retrieving LDPathSemanticIndex instances
>>>> =================================================
>>>> {stanbol_host}/index endpoint can be used to retrieve already registered
>>>> SemanticIndexes. An LDPathSemantic index can be created through the
>>>> RESTful service i.e {stanbol_host}/index/ldpath or through the Felix Web
>>>> Console by configuring a "Apache Stanbol Contenthub LDPath Based
>>>> Semantic Index".
>>>>
>>>> Each instance of LDPathSemanticIndex is registered as an OSGi component.
>>>> So, they can be obtained through ServiceTracker/@Reference.
>>>> Name(Semantic-Index-Name) and description(Semantic-Index-Name)
>>>> properties can be used to retrieve specific instances of
>>>> LDPathSemanticIndex from OSGi environment. Also, the
>>>> SemanticIndexManager service, provides retrieval of indexes according to
>>>> their names and EndpointTypes.
>>>>
>>>> Search over the LDPathSemanticIndex
>>>> ===================================
>>>> The previous search functionality of the Contenthub has not changed.
>>>> They are wrapped under two types of endpoints: 1) RESTful endpoints 2)
>>>> OSGi based Java endpoints. There are two RESTful endpoints which are
>>>> SOLR and CONTENTHUB. SOLR endpoint can be used to query the actual
>>>> underlying Solr core. CONTENTHUB endpoint offers a search option of
>>>> which results contain additional information in addition to the
>>>> resultant documents. Those additional information are facets regarding
>>>> the resultant documents and related keywords about the original query
>>>> term. This endpoint is more experimental one which is open to changes.
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>
>


Re: Contenthub 2-layered structure

Posted by Fabian Christ <ch...@googlemail.com>.
Hi Suat,

while I truly appreciate the new developments it would have been nice
to have some more information on what you guys are doing on this list.
Maybe I missed something. The community has to keep informed and get a
chance to follow what is happening. Next time I would suggest to try a
more incremental approach instead of submitting a big patch with tons
of changes at once. This is just about the process not about the great
contributions you did :)

I will also have a closer look next week.

Best,
 - Fabian

2012/7/19 Rupert Westenthaler <ru...@gmail.com>:
> Hi Suat,
>
> Great news! I will have a detailed look next week.
>
> best
> Rupert
>
> On Thu, Jul 19, 2012 at 4:15 PM, Suat Gonul <su...@gmail.com> wrote:
>> By the way, STANBOL-471 is the initial issue dedicated to this structure.
>>
>>
>> On 07/19/2012 05:12 PM, Suat Gonul wrote:
>>> Hi everyone,
>>>
>>> I have just committed the initial implementation of the index part of
>>> the 2-layered structure of Contenthub. So, we have initial
>>> implementations for both Store and Index layers now. Currently, this
>>> work is carried on under the "contenthub-two-layered-structure" branch.
>>> So, to try out this new structure, contenthub module under this branch
>>> should be built.
>>>
>>> I would be very glad to hear your feedbacks. Below, you can see the logs
>>> from the commit:
>>>
>>> Best,
>>> Suat
>>>
>>> Logs:
>>> Initial version of the default implementation of the SemanticIndex
>>> interface which is defined in STANBOL-499.
>>>
>>> SemanticIndex is one part of the 2-layered structure of Contenthub. The
>>> other part is the Store which is defined in STANBOL-498.
>>>
>>> Default implementation of the SemanticIndex interface
>>> (LDPathSemanticIndex) is based on the LDPath language. A new
>>> LDPathSemanticIndex can be created by providing name, description and
>>> LDPath values. In the scope of LDPathSemanticIndex the provided LDPath
>>> program is used in two ways which will be explained later in this log.
>>>
>>> Each instance of this implementation checks the changes in the Store at
>>> regular intervals in a separate thread and the interval length is
>>> configurable. After processing the changes in the Store, the last
>>> revision is stored persistently. In this way, when the index is
>>> restarted it will check the the changes as of the latest persisted
>>> revision. However, when the LDPath is changed the LDPathSemanticIndex
>>> will index the ContentItems from scratch. In this period the index will
>>> be REINDEXING state, and during this period, it does not allow other
>>> index or remove operations. After reindexing is completed, the state of
>>> the index will be ACTIVE.
>>>
>>> LDPath usages in LDPathSemanticIndex
>>> ====================================
>>> a) It is used to configure the underlying Solr core. With an LDPath the
>>> index fields are determined and Solr specific properties such as
>>> "multiValued", "termVectors" can be configured.
>>>
>>> b) When indexing of a ContentItem is in progress, each named entity
>>> contained in the enhancements of the ContentItem will be queried through
>>> the Entityhub. Then, the values obtained from Entityhub will be indexed
>>> along with the actual content as additional metadata. And the additional
>>> metadata will be completely compatible with the underlying Solr core.
>>>
>>> This ability to create customized indexes allows compatibility with
>>> different domains or use-cases.
>>>
>>> Creating,Retrieving LDPathSemanticIndex instances
>>> =================================================
>>> {stanbol_host}/index endpoint can be used to retrieve already registered
>>> SemanticIndexes. An LDPathSemantic index can be created through the
>>> RESTful service i.e {stanbol_host}/index/ldpath or through the Felix Web
>>> Console by configuring a "Apache Stanbol Contenthub LDPath Based
>>> Semantic Index".
>>>
>>> Each instance of LDPathSemanticIndex is registered as an OSGi component.
>>> So, they can be obtained through ServiceTracker/@Reference.
>>> Name(Semantic-Index-Name) and description(Semantic-Index-Name)
>>> properties can be used to retrieve specific instances of
>>> LDPathSemanticIndex from OSGi environment. Also, the
>>> SemanticIndexManager service, provides retrieval of indexes according to
>>> their names and EndpointTypes.
>>>
>>> Search over the LDPathSemanticIndex
>>> ===================================
>>> The previous search functionality of the Contenthub has not changed.
>>> They are wrapped under two types of endpoints: 1) RESTful endpoints 2)
>>> OSGi based Java endpoints. There are two RESTful endpoints which are
>>> SOLR and CONTENTHUB. SOLR endpoint can be used to query the actual
>>> underlying Solr core. CONTENTHUB endpoint offers a search option of
>>> which results contain additional information in addition to the
>>> resultant documents. Those additional information are facets regarding
>>> the resultant documents and related keywords about the original query
>>> term. This endpoint is more experimental one which is open to changes.
>>
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen



-- 
Fabian
http://twitter.com/fctwitt

Re: Contenthub 2-layered structure

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Suat,

Great news! I will have a detailed look next week.

best
Rupert

On Thu, Jul 19, 2012 at 4:15 PM, Suat Gonul <su...@gmail.com> wrote:
> By the way, STANBOL-471 is the initial issue dedicated to this structure.
>
>
> On 07/19/2012 05:12 PM, Suat Gonul wrote:
>> Hi everyone,
>>
>> I have just committed the initial implementation of the index part of
>> the 2-layered structure of Contenthub. So, we have initial
>> implementations for both Store and Index layers now. Currently, this
>> work is carried on under the "contenthub-two-layered-structure" branch.
>> So, to try out this new structure, contenthub module under this branch
>> should be built.
>>
>> I would be very glad to hear your feedbacks. Below, you can see the logs
>> from the commit:
>>
>> Best,
>> Suat
>>
>> Logs:
>> Initial version of the default implementation of the SemanticIndex
>> interface which is defined in STANBOL-499.
>>
>> SemanticIndex is one part of the 2-layered structure of Contenthub. The
>> other part is the Store which is defined in STANBOL-498.
>>
>> Default implementation of the SemanticIndex interface
>> (LDPathSemanticIndex) is based on the LDPath language. A new
>> LDPathSemanticIndex can be created by providing name, description and
>> LDPath values. In the scope of LDPathSemanticIndex the provided LDPath
>> program is used in two ways which will be explained later in this log.
>>
>> Each instance of this implementation checks the changes in the Store at
>> regular intervals in a separate thread and the interval length is
>> configurable. After processing the changes in the Store, the last
>> revision is stored persistently. In this way, when the index is
>> restarted it will check the the changes as of the latest persisted
>> revision. However, when the LDPath is changed the LDPathSemanticIndex
>> will index the ContentItems from scratch. In this period the index will
>> be REINDEXING state, and during this period, it does not allow other
>> index or remove operations. After reindexing is completed, the state of
>> the index will be ACTIVE.
>>
>> LDPath usages in LDPathSemanticIndex
>> ====================================
>> a) It is used to configure the underlying Solr core. With an LDPath the
>> index fields are determined and Solr specific properties such as
>> "multiValued", "termVectors" can be configured.
>>
>> b) When indexing of a ContentItem is in progress, each named entity
>> contained in the enhancements of the ContentItem will be queried through
>> the Entityhub. Then, the values obtained from Entityhub will be indexed
>> along with the actual content as additional metadata. And the additional
>> metadata will be completely compatible with the underlying Solr core.
>>
>> This ability to create customized indexes allows compatibility with
>> different domains or use-cases.
>>
>> Creating,Retrieving LDPathSemanticIndex instances
>> =================================================
>> {stanbol_host}/index endpoint can be used to retrieve already registered
>> SemanticIndexes. An LDPathSemantic index can be created through the
>> RESTful service i.e {stanbol_host}/index/ldpath or through the Felix Web
>> Console by configuring a "Apache Stanbol Contenthub LDPath Based
>> Semantic Index".
>>
>> Each instance of LDPathSemanticIndex is registered as an OSGi component.
>> So, they can be obtained through ServiceTracker/@Reference.
>> Name(Semantic-Index-Name) and description(Semantic-Index-Name)
>> properties can be used to retrieve specific instances of
>> LDPathSemanticIndex from OSGi environment. Also, the
>> SemanticIndexManager service, provides retrieval of indexes according to
>> their names and EndpointTypes.
>>
>> Search over the LDPathSemanticIndex
>> ===================================
>> The previous search functionality of the Contenthub has not changed.
>> They are wrapped under two types of endpoints: 1) RESTful endpoints 2)
>> OSGi based Java endpoints. There are two RESTful endpoints which are
>> SOLR and CONTENTHUB. SOLR endpoint can be used to query the actual
>> underlying Solr core. CONTENTHUB endpoint offers a search option of
>> which results contain additional information in addition to the
>> resultant documents. Those additional information are facets regarding
>> the resultant documents and related keywords about the original query
>> term. This endpoint is more experimental one which is open to changes.
>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Contenthub 2-layered structure

Posted by Suat Gonul <su...@gmail.com>.
By the way, STANBOL-471 is the initial issue dedicated to this structure.


On 07/19/2012 05:12 PM, Suat Gonul wrote:
> Hi everyone,
>
> I have just committed the initial implementation of the index part of
> the 2-layered structure of Contenthub. So, we have initial
> implementations for both Store and Index layers now. Currently, this
> work is carried on under the "contenthub-two-layered-structure" branch.
> So, to try out this new structure, contenthub module under this branch
> should be built.
>
> I would be very glad to hear your feedbacks. Below, you can see the logs
> from the commit:
>
> Best,
> Suat
>
> Logs:
> Initial version of the default implementation of the SemanticIndex
> interface which is defined in STANBOL-499.
>
> SemanticIndex is one part of the 2-layered structure of Contenthub. The
> other part is the Store which is defined in STANBOL-498.
>
> Default implementation of the SemanticIndex interface
> (LDPathSemanticIndex) is based on the LDPath language. A new
> LDPathSemanticIndex can be created by providing name, description and
> LDPath values. In the scope of LDPathSemanticIndex the provided LDPath
> program is used in two ways which will be explained later in this log.
>
> Each instance of this implementation checks the changes in the Store at
> regular intervals in a separate thread and the interval length is
> configurable. After processing the changes in the Store, the last
> revision is stored persistently. In this way, when the index is
> restarted it will check the the changes as of the latest persisted
> revision. However, when the LDPath is changed the LDPathSemanticIndex
> will index the ContentItems from scratch. In this period the index will
> be REINDEXING state, and during this period, it does not allow other
> index or remove operations. After reindexing is completed, the state of
> the index will be ACTIVE.
>
> LDPath usages in LDPathSemanticIndex
> ====================================
> a) It is used to configure the underlying Solr core. With an LDPath the
> index fields are determined and Solr specific properties such as
> "multiValued", "termVectors" can be configured.
>
> b) When indexing of a ContentItem is in progress, each named entity
> contained in the enhancements of the ContentItem will be queried through
> the Entityhub. Then, the values obtained from Entityhub will be indexed
> along with the actual content as additional metadata. And the additional
> metadata will be completely compatible with the underlying Solr core.
>
> This ability to create customized indexes allows compatibility with
> different domains or use-cases.
>
> Creating,Retrieving LDPathSemanticIndex instances
> =================================================
> {stanbol_host}/index endpoint can be used to retrieve already registered
> SemanticIndexes. An LDPathSemantic index can be created through the
> RESTful service i.e {stanbol_host}/index/ldpath or through the Felix Web
> Console by configuring a "Apache Stanbol Contenthub LDPath Based
> Semantic Index".
>
> Each instance of LDPathSemanticIndex is registered as an OSGi component.
> So, they can be obtained through ServiceTracker/@Reference.
> Name(Semantic-Index-Name) and description(Semantic-Index-Name)
> properties can be used to retrieve specific instances of
> LDPathSemanticIndex from OSGi environment. Also, the
> SemanticIndexManager service, provides retrieval of indexes according to
> their names and EndpointTypes.
>
> Search over the LDPathSemanticIndex
> ===================================
> The previous search functionality of the Contenthub has not changed.
> They are wrapped under two types of endpoints: 1) RESTful endpoints 2)
> OSGi based Java endpoints. There are two RESTful endpoints which are
> SOLR and CONTENTHUB. SOLR endpoint can be used to query the actual
> underlying Solr core. CONTENTHUB endpoint offers a search option of
> which results contain additional information in addition to the
> resultant documents. Those additional information are facets regarding
> the resultant documents and related keywords about the original query
> term. This endpoint is more experimental one which is open to changes.