You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by Suat Gonul <su...@gmail.com> on 2011/06/01 08:53:51 UTC

Contenthub structure

Hi everbody,

After discussing with Rupert yesterday, we have come up with a basic 
design for the Contenthub component.

It will provide two main RESTful interface to:

1) Upload (register) content and metadata (Available in current 
implementation)
2) Search for registered content

There would be Indexing Engines for (1) and Search Engines for (2). The 
Contenthub implementation would then implement Indexing Engines to store 
the enhancements in a triple store and Search Engines to search 
enhancements and content items in triple store.

There is also an already started implementation for the search part in 
google code base of IKS project at [1]. It will be integrated to the 
Contenthub component.

What do you think?


Best,
Suat

[1] 
http://code.google.com/p/iks-project/source/browse/sandbox/#sandbox%2Fsearch

Re: Contenthub structure

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi all

I will try to create a small usage Szenario here:

A user posts a query for "CMS workshops in France" to the Contenthub:

The semantic Search component of the Contenthub uses several
SeachEngines (like EnhancementEngines in the Enhancer).

1. OntologySearcher: It tries to identify Concepts mentioned in the
Search. For the example it will find the Concpet "Workshop"
2. EntitySearcher: It tries to find Entities for words used in the
Query. For the example it will find "France"
3. Faceted Search engine: It will compose a Lucene type search for
Documents with
 * a reference Workshop
 * a reference to France
 * the text "CMS"

If there would be an other Search engine that can understand internal
structure of the query one could even search for things
* with the type Workshop
* located within Paris
* the text "CMS"
and because Workshops are events one could activate Facets for
* Location
* Time
* Participants
* facets explicitly requested with the query (e.g. Tags, Creator ...)

So the Idea is to use

* Ontologies (CMS-Adapter & Kres)
* Entityhub
* maybe neuronal networks with learned query patterns??
* other stuff??

for query preprocessing and

* full text indices over Documents
* full text indices over Facts (like the Workshop)
* SPARQL endpoints over Enhancements
* other things??

for the execution of the enhances query.

Joining results from the different sources (Documents, Facts,
Enhancements) would be challenging. However I think this feature would
not be necessary for a first version.

I would also like to consider this
[Screencast](http://www.srdc.com.tr/iks/2ndyear/DemoVideo.htm) in the
context of this Usage Scenario.

WDYT
Rupert

On Wed, Jun 1, 2011 at 10:26 AM, Olivier Grisel
<ol...@ensta.org> wrote:
> 2011/6/1 Suat Gonul <su...@gmail.com>:
>> Hi everbody,
>>
>> After discussing with Rupert yesterday, we have come up with a basic design
>> for the Contenthub component.
>>
>> It will provide two main RESTful interface to:
>>
>> 1) Upload (register) content and metadata (Available in current
>> implementation)
>> 2) Search for registered content
>>
>> There would be Indexing Engines for (1) and Search Engines for (2). The
>> Contenthub implementation would then implement Indexing Engines to store the
>> enhancements in a triple store and Search Engines to search enhancements and
>> content items in triple store.
>>
>> There is also an already started implementation for the search part in
>> google code base of IKS project at [1]. It will be integrated to the
>> Contenthub component.
>>
>> What do you think?
>
> I think the default search implementation for content should be based
> on fulltext indexing using the EntityHub's SolrYard extended with
> faceted search.
>
> I find fulltext search + structure facet based structured refinements
> combo much more intuitive than the traditional multi-fields form based
> search interface.
>
> --
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Contenthub structure

Posted by Olivier Grisel <ol...@ensta.org>.
2011/6/1 Suat Gonul <su...@gmail.com>:
> Hi everbody,
>
> After discussing with Rupert yesterday, we have come up with a basic design
> for the Contenthub component.
>
> It will provide two main RESTful interface to:
>
> 1) Upload (register) content and metadata (Available in current
> implementation)
> 2) Search for registered content
>
> There would be Indexing Engines for (1) and Search Engines for (2). The
> Contenthub implementation would then implement Indexing Engines to store the
> enhancements in a triple store and Search Engines to search enhancements and
> content items in triple store.
>
> There is also an already started implementation for the search part in
> google code base of IKS project at [1]. It will be integrated to the
> Contenthub component.
>
> What do you think?

I think the default search implementation for content should be based
on fulltext indexing using the EntityHub's SolrYard extended with
faceted search.

I find fulltext search + structure facet based structured refinements
combo much more intuitive than the traditional multi-fields form based
search interface.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel