You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@stanbol.apache.org by Rupert Westenthaler <rw...@apache.org> on 2011/02/28 22:59:12 UTC

Stanbol Enhancement Structure (discussion)

Hi all,

Today I committed a first proposal for the Stanbol Enhancement
Structure that will replace the FISE Enhancement Structure currently
used by the Stanbol Enhancer.

The proposal can be found at
  http://stanbol.staging.apache.org/stanbol/docs/trunk/enhancer/stanbolenhancementstructure.html

The source is located at
  http://svn.apache.org/repos/asf/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/stanbolenhancementstructure.mdtext

Background:

Currently the Stanbol Enhancer still uses the FISE Enhancement
Structure. Changing this is unavoidable but will break all the current
clients.
Therefore the current plan is to keep using the current structure for
some time and switch only to a new one as soon as we also implement
new features that do require an extended Enhancement Structure (e.g.
support for extracting metadata from parsed content)

As discussed with ogrisel: The Issues STANBOL-12 and STANBOL-48 can
and will be resolved by extending the current FISE Enhancement
Structure (and therefore without breaking existing clients)

Main Goals of this Proposal:

 - start the discussion early and give peoples time to contribute
 - inspire usage scenarios to catch as many requirements as possible
 - propose solutions for shortcomings and missing features of the FISE
Enhancement Structure

As reminder:

The biggest shortcoming of the current FISE Enhancement Structure was
the complexity to consume (understand/parse/query) it on the client
side. This can - to some extend - be improved by providing clients,
but a good design of the Enhancement Structure will always be a
central point for the ease of use of the Stanbol Enhancer component.

I my opinion the easiness depends on a lot of things including
 - human readable default serialisation (JSON-LD): A flat structure
that uses less resources with a lot of properties would help with
that. Having small pieces of information that link each other randomly
distributed over the whole file is a disaster typically for many
serialised RDF data and something we must aim to avoid.
 - easy to read/write and modify (SPARQL) queries
 - meaningful property and concept names
 - usage of well known and understood metadata standards such as Dublin Core

best
Rupert Westenthaler

-- 
| Rupert Westenthaler                            rwesten@apache.org
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Stanbol Enhancement Structure (discussion)

Posted by Olivier Grisel <ol...@ensta.org>.

BTW, if you want to edit the above diagram to propose refinements,
please use: https://creately.com/diagram/gkvbmoey1/VxUlwG3VHYOuNPwkurI6WXQBJfo%3D


-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Re: Stanbol Enhancement Structure (discussion)

Posted by Rupert Westenthaler <rw...@apache.org>.

Based on this discussion I tried to describe the sb:Suggestion
concept. Suggestions and comments are very welcome.

NOTE: The remaining email is formatted using markdown syntax because I
expect to use this as part of the Stanbol Enhancement Structure
Specification

- - - - -

=== sb:Suggestion ===

Suggestions are used by the Stanbol Enhancer to suggest possible
values for the resolution features extracted from the parsed content.
Currently there are two different use cases for Suggestions defined

*(1) Entity Resolution:* Suggests entities for an Feature extracted
from the content. Typically such suggestions are calculated based on
the name of the feature found within the content (e.g. the selected
text of a sb:TextOccurrence).

*(2) Field Value Suggestion:* Suggest a value for a specific property.
This kind of suggestion are useful if an relation between two
extracted features is detected. A typical example would be a person
"Steve Jobs" with the role "CEO" of the company "Apple Inc". Such
relations can be detected by NLP tools. However suggestions like this
are also central for semantic lifting of RDFa annotations as shown in
the example below.

sb:Suggestion uses the following properties

 * sb:entity: The id of the suggested Entity
 * sb:entity-type: The type(s) of the suggested Entity
 * sb:confidence: Needed to sort in case of multiple suggestions
 * sb:field: Defines the property this suggestion should become the
value if accepted by the user

In addition all sb:Suggestions are also of type sb:Enhancement to
allow EnhancementEngine to provide enhancement metadata for them.

for details how they are used please see the following Example

==== Example ====

As example lets assume that the following RDFa annotated content is
parsed to the Stanbol Enhancer

    <span typeof="cal:Vevent">
        <h3 property="dc:title"> Stanbol Teleconference </h3>
        <span property="cal:summary>
            <p> Agenda: </p>
            <ul>
                <li> ... </li>
            <ul>
            <p> Participants: </p>
            <ul>
                <li typeof="foaf:Person" property="foaf:name">Rupert
Westenthaler</li>
                <li typeof="foaf:Person" property="foaf:name">Olivier
Grisel</li>
                <li> ... </li>
            </ul>
        </span>
    </span>

(1) Suggest the Entities for Rupert and Olivier
(2) Suggest to link Rupert and Olivier as values for "cal:attendee"

Both for Rupert Westenthaler and Olivier Grisel an EntityAnnotation
would be present - in that case created by the RDFa extractor, but in
principle this could also work if the RDFa markup is missing. In such
cases the EntityAnnotations could be created by an
NLPEnhancementEngine.

    <a1> rdf:type sb:EntityAnnotation
    <a1> dc:title Rupert Westenthaler
    <a1> sb:entity-type foaf:Person
    <a1> sb:hasOccurrence <o1>
    <a1> sb:hasSuggestion <s1>

    <a2> rdf:type sb:EntityAnnotation
    <a2> dc:title Olivier Grisel
    <a1> sb:entity-type foaf:Person
    <a2> sb:hasOccurrence <o2>
    <a2> sb:hasSuggestion <s2>

Lets ignore the occurrences - because how to create Occurrences for
RDFa markup is a whole different story that needs to be specified -
and concentrate on the suggestions.

    <s1> rdf:type sb:Suggestion
    <s1> sb:entity <http://www.example.com/person/Rupert_Westenthaler>
    <s1> sb:entity-type foaf:Person, vCard:vCard, dbpedia-ont:Person
    <s1> sb:confidence 123,456

    <s2> rdf:type sb:Suggestion
    <s2> sb:entity <http://www.example.com/person/Olivier_Grisel>
    <s2> sb:entity-type foaf:Person, vCard:vCard, dbpedia-ont:Person
    <s2> sb:confidence 234,567

If the suggestion is accepted by the client the RDFa markup could be
updated like this

    <li about="http://www.example.com/person/Rupert_Westenthaler"
        typeof="foaf:Person" property="foaf:name">Rupert Westenthaler</li>
    <li about="http://www.example.com/person/Olivier_Grisel"
        typeof="foaf:Person" property="foaf:name">Olivier Grisel</li>

Now lets have a detailed look at the suggestions to add Rupert and
Olivier as a "cal:attendee" to the meeting.
First we need to create an EntityAnnotation for the Meeting that would
be created by the RDFa extractor

    <a> rdf:type sb:EntityAnnotation
    <a> dc:title "Stanbol Teleconference"
    <a> sb:entity-type cal:Vevent
    <a> sb:hasOccurrence <o>
    <a> sb:hasSuggestion <s3>
    <a> sb:hasSuggestion <s4>

Again lets skip the occurrence and look at the two suggestions.
What I want to do here is to suggest to use the Annotations for Rupert
(<a1>) and Olivier (<a2>) as values for the property "cal:attendee".
It is important to suggest here the annotations <a1> and <a2> as
values and NOT the suggested entities (e.g.
<http://www.example.com/person/Rupert_Westenthaler> in case of <a1>)
because the Stanbol Enhancer can not assume that the user will accepts
the suggestions <s1> for <a1> and <s2> for <a2>.
The following suggestions also use the sb:field property to tell the
user that the suggestions is about values for the "cal:attendee"
property.

    <s3> rdf:type sb:Suggestion
    <s3> sb:field cal:attendee
    <s3> sb:entity <a1>
    <s3> sb:entity-type sb:EntityAnnotation
    <s3> sb:confidence 12,34

    <s4> rdf:type sb:Suggestion
    <s4> sb:field cal:attendee
    <s4> sb:entity <a2>
    <s4> sb:entity-type sb:EntityAnnotation
    <s4> sb:confidence 12,34

NOTE:

 * I am not sure if it is a good Idea to use "sb:entity" to link to an
annotation created by the Stanbol Enhancer because it might confuse
users if the same property is used to link external and internal
resources. However introducing an additional property such as
"sb:value" seam also not better.

Here the RDFa markup if the user accepts <s3> and <s4> but not <s1> and <s2>

    <span typeof="cal:Vevent">
        [...]
        <p> Participants: </p>
        <ul property="cal:attendee">
            <li typeof="foaf:Person" property="foaf:name">Rupert
Westenthaler</li>
            <li typeof="foaf:Person" property="foaf:name">Olivier Grisel</li>
            <li> ... </li>
        </ul>
    </span>

and finally the RDFa markup if the all suggestions are accepted by the
client side

    <span typeof="cal:Vevent">
        [...]
        <p> Participants: </p>
        <ul property="cal:attendee">
            <li about="http://www.example.com/person/Rupert_Westenthaler"
                typeof="foaf:Person" property="foaf:name">Rupert
Westenthaler</li>
            <li about="http://www.example.com/person/Olivier_Grisel"
                typeof="foaf:Person" property="foaf:name">Olivier Grisel</li>
        </ul>
    </span>

Re: Stanbol Enhancement Structure (discussion)

Posted by Olivier Grisel <ol...@ensta.org>.

2011/3/7 Rupert Westenthaler <ru...@gmail.com>:
> On Mon, Mar 7, 2011 at 12:17 PM, Olivier Grisel
> <ol...@ensta.org> wrote:
>> 2011/3/7 Rupert Westenthaler <rw...@apache.org>:
>>>
>>> or (3) force implementors of EnhancementEngines to add both
>>> sb:Annotation and sb:EntityAnnotation as rdf:type.
>>>
>>> All this three Solutions do not seam very promising because
>>>  - users will not want to enable reasoning
>>>  - UNIONS do make queries very complex, and what happens if we add an
>>> other Annotation type?
>>>  - adding multiple rdf:types would be nice solution for the client
>>> side, but has the danger that some functionality will break if an
>>> EnhancementEngines does not add the additional type.
>>
>> I agree we should not impose UNIONS of rdfs reasoning to the client
>> side. I had solution 3 in mind: every time you add an annotation that
>> is about the identification of an Entity in the text (as opposed to
>> the time of annotations such as finding the topic of the content item,
>> the language, a keyphrase, ...), the enhancement engine should add
>> both rdf:type sb:Annotation and rdf:type sb:EntityAnnotation. If it
>> does not it's considered a bug and has to be fixed.
>>
> If we aim in that direction, we should create a test framework as part
> of the Stanbol Enhancer. EnhancementEngines would than need to pass
> all the tests defined by this Framework.
> I think it should not be hard to write simple unit tests that check if
> Enhancements created by EnhancementEngines are in line with the
> Enhancement Structure.

+1 We will also update the Java helpers to generate annotations that
are consistent with the model.


-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Re: Stanbol Enhancement Structure (discussion)

Posted by Rupert Westenthaler <ru...@gmail.com>.

On Mon, Mar 7, 2011 at 12:17 PM, Olivier Grisel
<ol...@ensta.org> wrote:
> 2011/3/7 Rupert Westenthaler <rw...@apache.org>:
>>
>> or (3) force implementors of EnhancementEngines to add both
>> sb:Annotation and sb:EntityAnnotation as rdf:type.
>>
>> All this three Solutions do not seam very promising because
>>  - users will not want to enable reasoning
>>  - UNIONS do make queries very complex, and what happens if we add an
>> other Annotation type?
>>  - adding multiple rdf:types would be nice solution for the client
>> side, but has the danger that some functionality will break if an
>> EnhancementEngines does not add the additional type.
>
> I agree we should not impose UNIONS of rdfs reasoning to the client
> side. I had solution 3 in mind: every time you add an annotation that
> is about the identification of an Entity in the text (as opposed to
> the time of annotations such as finding the topic of the content item,
> the language, a keyphrase, ...), the enhancement engine should add
> both rdf:type sb:Annotation and rdf:type sb:EntityAnnotation. If it
> does not it's considered a bug and has to be fixed.
>
If we aim in that direction, we should create a test framework as part
of the Stanbol Enhancer. EnhancementEngines would than need to pass
all the tests defined by this Framework.
I think it should not be hard to write simple unit tests that check if
Enhancements created by EnhancementEngines are in line with the
Enhancement Structure.

best
Rupert


-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Stanbol Enhancement Structure (discussion)

Posted by Olivier Grisel <ol...@ensta.org>.

2011/3/7 Rupert Westenthaler <rw...@apache.org>:
>
> or (3) force implementors of EnhancementEngines to add both
> sb:Annotation and sb:EntityAnnotation as rdf:type.
>
> All this three Solutions do not seam very promising because
>  - users will not want to enable reasoning
>  - UNIONS do make queries very complex, and what happens if we add an
> other Annotation type?
>  - adding multiple rdf:types would be nice solution for the client
> side, but has the danger that some functionality will break if an
> EnhancementEngines does not add the additional type.

I agree we should not impose UNIONS of rdfs reasoning to the client
side. I had solution 3 in mind: every time you add an annotation that
is about the identification of an Entity in the text (as opposed to
the time of annotations such as finding the topic of the content item,
the language, a keyphrase, ...), the enhancement engine should add
both rdf:type sb:Annotation and rdf:type sb:EntityAnnotation. If it
does not it's considered a bug and has to be fixed.

The goal is to make it more explicit to the client what kind of
annotations it is: it can be useful to define a mapping from
annotation sub type to an icon in UI in the client for instance. I we
do not make this explicit, the client will have to implement many
rules to test for attribute presence to guess  the subtype and display
the right icon in the UI.

> Therefore I would suggest to just define sb:Annotation in the
> Enhancement Structures. If someone is only interested in
> EntityAnnotations he need only to add the sb:entity and the
> sb:entity-type property as required constraints to the query.
>
> select ?entityAnnotation, ?title, ?entity, ?type
> where {
>   ?entityAnnotation rdf:type sb:Annotation .
>   ?entityAnnotation dc:title ?title .
>   ?entityAnnotation sb:entity ?entity .
>   ?entityAnnotation sb:entity-type ?type
> }
>
>
> Even if concepts like "sb:EntiyAnnI otation" are not part of the
> enhancement structure we can add them by defining them (formally) in
> an own model that users can load if the want to use reasoning. Such a
> model would use OWL as an ontology language and also assume that when
> it is used an OWL reasoner is present.
> Within such an model the Concept "sb:EntityAnnotation" would be
> defined by declaring that it is an "sb:Annotation" with two
> owl:someValuesOf restriction for "sb:entity" and "sb:entity-type". In
> addition if would define "sb:entity" as an owl:FunctionalProperty
> (meaning that it can only have a single value).
> An OWL reasoner would now be able to deduce that an
> "sb:EntityAnnotation" instance with a value for sb:entity and one or
> more values for sb:entity-type is actually an "sb:EntityAnnotation"
> instance.

I agree we can have intentional definition of the sb:EntityAnnotation
sub-type but that impose further burden on the client who must either:
- have a owl reasoner (and understand how OWL reasoning works and the
intentional type definition approach)
- be a master of SPARQL playing with OPTION and FILTER (which is BTW
probably less efficiently computed on the server side than explicit
queries on materialized sub-types).

My will to introduce the sb:EntityAnnotation sub-type is to make the
model more explicit to reduce the cognitive pressure on the client and
an the expectations put on the query plan optimizer on the server.

> About sb:Suggestion
>> For Suggestion I would rather name it LinkingSuggestion,
>> LinkSuggestion or ResolutionSuggestion to make it explicit that is
>> this type of enhancement is about suggesting to link to a resource
>> from the LOD cloud or from a private LinkedData knowledge base (using
>> the entityhub as proxy).
>
> Defining sb:LinkSuggestion and sb:ResolutionSuggestion as
> rdf:subClassOf sb:Suggestion would cause the same problems as
> described for sb:Annotation. Therefore I would recommend to use only
> sb:Suggestion and use an property to distinguish between these two
> types.
>
> The sb:LinkSuggestion and sb:ResolutionSuggestion concepts can be
> defined in the Reasoning (OWL) model if needed.

My proposal was not to make sub type sb:Suggestion with
sb:LinkSuggestion but to rename it.  sb:LinkSuggestion would be a
toplevel type in my model. Unless you have other kind of suggestions
that are not about linking with resources.

But I agree we can keep the sb:Suggestion name if you prefer.

As for Occurrences we agree, hence no further comments here.

Regards,

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Re: Stanbol Enhancement Structure (discussion)

Posted by Rupert Westenthaler <rw...@apache.org>.

Hi

On Sun, Mar 6, 2011 at 11:48 PM, Olivier Grisel
<ol...@ensta.org> wrote:
> Ok for removing Entity in the base class names. But I would also like
> to have a subclass EntityAnnotation with the properties sb:entity-type
> defined on it.

I would recommend do NOT use sub-classes (and sub-properties) for the
enhancement structure where ever possible and I will try to explain
why and even provide a solution to how introduce it later (if
necessary).

First let assume that both "sb:Annotation" and "sb:EntityAnnotation"
are defined and that "sb:EntityAnnotation rdf:subClassof
sb:EntityAnnotation". Some EnhancementEngines will create instances of
sb:Annotation, while other will create "sb:EntityAnnotation"
instances.

Now lets look what happens if a user wants to query for all
annotations. Most likely he would look up the documentation find the
sb:Annotation concept and create a query like

select ?annotation, ?title
where {
   ?annotation rdf:type sb:Annotation .
   ?annotation dc:title ?title
}

Maybe he is also interested in providing a link to the entity - if
available - and modify the query like

select ?annotation, ?title, ?entity
where {
   ?annotation rdf:type sb:Annotation .
   ?annotation dc:title ?title .
   optional {
      ?annotation sb:entity ?entity
   }
}

To his surprise such queries would not return all sb:Annotation and
even more the second query would not return a single result with a
link to an entity. The reason for that is, that on the client side
there will not be an RDFS reasoner and therefore the information that
an sb:EntityAnnotation is also an sb:Annotation is just not present.
Therefore the above queries will just return the sb:Annotation
instances but not the sb:EntityAnnotations.

To get both he would need to (1)enable an RDFS reasoner, (2) manually
merge the different annotation types by using UNIONs in the queries

where {
   {
      { ?annotation rdf:type sb:Annotation }
      UNION
      { ?annotation rdf:type sb:EntityAnnotation}
   }
   ?annotation dc:title ?title .
   optional {
      ?annotation sb:entity ?entity
   }
}

or (3) force implementors of EnhancementEngines to add both
sb:Annotation and sb:EntityAnnotation as rdf:type.

All this three Solutions do not seam very promising because
 - users will not want to enable reasoning
 - UNIONS do make queries very complex, and what happens if we add an
other Annotation type?
 - adding multiple rdf:types would be nice solution for the client
side, but has the danger that some functionality will break if an
EnhancementEngines does not add the additional type.

Therefore I would suggest to just define sb:Annotation in the
Enhancement Structures. If someone is only interested in
EntityAnnotations he need only to add the sb:entity and the
sb:entity-type property as required constraints to the query.

select ?entityAnnotation, ?title, ?entity, ?type
where {
   ?entityAnnotation rdf:type sb:Annotation .
   ?entityAnnotation dc:title ?title .
   ?entityAnnotation sb:entity ?entity .
   ?entityAnnotation sb:entity-type ?type
}

Even if concepts like "sb:EntiyAnnotation" are not part of the
enhancement structure we can add them by defining them (formally) in
an own model that users can load if the want to use reasoning. Such a
model would use OWL as an ontology language and also assume that when
it is used an OWL reasoner is present.
Within such an model the Concept "sb:EntityAnnotation" would be
defined by declaring that it is an "sb:Annotation" with two
owl:someValuesOf restriction for "sb:entity" and "sb:entity-type". In
addition if would define "sb:entity" as an owl:FunctionalProperty
(meaning that it can only have a single value).
An OWL reasoner would now be able to deduce that an
"sb:EntityAnnotation" instance with a value for sb:entity and one or
more values for sb:entity-type is actually an "sb:EntityAnnotation"
instance.

related to this paragraph
> [..] If I remember
> correctly, OpenCalais has also a notion of unresolved entity (or local
> / ambiguous entity) with potentially many occurrences in the document
> and optionally links to "resolved" entities (for famous person,
> organizations and places) that occur in many documents.
I would define the following concepts within the reasoning model:

sb:LocalEntity
 - rdf:subClassOf sb:Annotation
 - owl:someValuesof sb:entity-type
 - owl:cardinality=0 sb:entity
 - owl:cardinaltiy=0 sb:hasSuggestion

sb:AmbiguousEntity
 - rdf:subClassOf sb:Annotation
 - owl:someValuesof sb:entity-type
 - owl:cardinality=0 sb:entity
 - owl:someValuesof sb:hasSuggestion

sb:ResolvedEntity
 - rdf:subClassOf sb:Annotation
 - owl:someValuesof sb:entity-type
 - owl:someValuesof sb:entity

Clients without reasoning support would need to use the following
SPARQL queries to get the same functionality by using special SPARQL
queries:

For listing only Annotations representing "sb:LocalEntity" instances:

select ?localEntity ?title ?type
where {
   ?localEntity rdf:type sb:Annotation .
   ?localEntity dc:title ?title .
   ?localEntity sb:entity-type ?type .
   optional {
      ?localEntity sb:entity ?entity .
      ?localEntity sb:hasSuggestion ?suggestion .
      FILTER(!bound(?entity) && !bound(?suggestion))
   }
}

For listing only Annotations representing "sb:AmbiguousEntity" instances:

select ?localEntity ?title ?type ?suggestion
where {
   ?localEntity rdf:type sb:Annotation .
   ?localEntity dc:title ?title .
   ?localEntity sb:entity-type ?type .
   ?localEntity sb:hasSuggestion ?suggestion .
   optional {
      ?localEntity sb:entity ?entity .
      FILTER(!bound(?entity))
   }
}

For listing only Annotations representing "sb:ResolvedEntity" instances

select ?localEntity ?title ?type ?entity
where {
   ?localEntity rdf:type sb:Annotation .
   ?localEntity dc:title ?title .
   ?localEntity sb:entity-type ?type .
   ?localEntity sb:entity ?entity .
}

About sb:Suggestion
> For Suggestion I would rather name it LinkingSuggestion,
> LinkSuggestion or ResolutionSuggestion to make it explicit that is
> this type of enhancement is about suggesting to link to a resource
> from the LOD cloud or from a private LinkedData knowledge base (using
> the entityhub as proxy).

Defining sb:LinkSuggestion and sb:ResolutionSuggestion as
rdf:subClassOf sb:Suggestion would cause the same problems as
described for sb:Annotation. Therefore I would recommend to use only
sb:Suggestion and use an property to distinguish between these two
types.

The sb:LinkSuggestion and sb:ResolutionSuggestion concepts can be
defined in the Reasoning (OWL) model if needed.

About Occurrences:

For Occurrences sub-classing will not be avoidable, because
occurrences in different content types (Text, Images, Video, Audio,
Metadata) need to be very different. However this high differences
will it also make less likely that users will need to use queries for
any kind of Occurrences. Nonetheless the current proposal of the
enhancement structures already states that enhancement engines need to
also add the "?occurrence rdf:type sb:Occurrence" statement if they
create an instance of a more specific one.

A possible workaround to query for all Occurrences without depending
on the fact that all EnhancementEngines do add the  "?occurrence
rdf:type sb:Occurrence" statement would be to use the incoming
"?annotation sb:hasOccurrence ?occurrence" triple as shown in the
following example

select ?occurrence ?annotation
where {
   ?annotation sb:hasOccurrence ?occurrence
}

This would provide a list of all occurrences regardless of the
rdf:type of the Occurrence. It would only filter Occurrences that are
not linked an sb:Annotation (something users will not be interested
in)

best
Rupert Westenthaler

-- 
| Rupert Westenthaler                            rwesten@apache.org
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Stanbol Enhancement Structure (discussion)

Posted by Olivier Grisel <ol...@ensta.org>.

2011/3/6 Rupert Westenthaler <ru...@gmail.com>:
> On Fri, Mar 4, 2011 at 5:55 PM, Olivier Grisel <ol...@ensta.org> wrote:
>
> Just to be sure ... in the diagram the sb:EntityAnnotation and the
> sb:TextOccurrence would be created by the NLP engine and the
> EntitySuggestion would be created by an EntityTaggingEngine (e.g. the
> current Autotagger).
>
> That would mean that the NLP Engine can detect that "John Smith" and
> "Mr Smith" are the same Entity? Your issue is therefore - and in such
> a case that would be correct - that a single Annotation would need to
> have multiple Occurrences.
> If we need to model something like that, that we need to create an own
> resource for each Occurrence and link them with an relation e.g.
> "sb:hasOccurrence" to the Annotation.

Yes this roughly what I described in the diagram.

> Regarding the naming I would suggest to remove the "Entity". This
> would mean to use "Annotation", "Suggestion" and "Occurrence" with
> sub-types "TextOccurrence", "MetadataOccurrence" ...

Ok for removing Entity in the base class names. But I would also like
to have a subclass EntityAnnotation with the properties sb:entity-type
defined on it.

For Suggestion I would rather name it LinkingSuggestion,
LinkSuggestion or ResolutionSuggestion to make it explicit that is
this type of enhancement is about suggesting to link to a resource
from the LOD cloud or from a private LinkedData knowledge base (using
the entityhub as proxy).

> If you confirm that we need multiple TextOccurrences for a single
> Annotation, than I will made the according changes to the
> specification.

Yes I confirm this. The Temis annotation engine does the same kind
modeling (except that resolution is somehow implicit). If I remember
correctly, OpenCalais has also a notion of unresolved entity (or local
/ ambiguous entity) with potentially many occurrences in the document
and optionally links to "resolved" entities (for famous person,
organizations and places) that occur in many documents.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Re: Stanbol Enhancement Structure (discussion)

Posted by Olivier Grisel <ol...@ensta.org>.

Ok I gave the enhancement structure a deeper look and I see one major issue.

Suppose we have the sentence:

"John Smith, CEO of Smith Consulting Ltd. declared to the press..."

And later in the same document:

"Mr Smith further announced..."

The current implementation of Named Entity detection will detect that
"John Smith" and "Mr Smith" are occurrences of the same named entity
(using a subsumption relationship between the labels). Thereafter the
engine in charge of trying to lookup those entities in Wikipedia and
might or might not find entity link suggestion for all of them at once
instead of replicating the same entities suggestion for each
occurrence of the same entity in a given document.

To be able to represent this with the Stanbol Enhancement structure I
would suggest something like:

  https://creately.com/diagram/gkvbmoey1/UN4phlmAlny9qBWOPuN3u8qeVfE%3D

For a given sb:EntityAnnotation we expect at least one
sb:TextOccurrence and 0 to many sb:EntitySuggestion.

What do you think?

-- 
Olivier

Re: Stanbol Enhancement Structure (discussion)

Posted by Bertrand Delacretaz <bd...@apache.org>.

Hi Rupert,

On Mon, Feb 28, 2011 at 10:59 PM, Rupert Westenthaler
<rw...@apache.org> wrote:
> ...The proposal can be found at
>  http://stanbol.staging.apache.org/stanbol/docs/trunk/enhancer/stanbolenhancementstructure.html

Can we publish it to http://incubator.apache.org/stanbol/ ? It will
show up there anyway as soon as someone publishes the site for another
reason.

-Bertrand

Re: Stanbol Enhancement Structure (discussion)

Posted by Rupert Westenthaler <ru...@gmail.com>.

On Fri, Mar 4, 2011 at 12:54 PM, Stephane Gamard
<st...@salsadev.com> wrote:
> +1. As this is an ongoing internal debate here at SD (what really is a tag versus a keyword vs keyterm concept, ... ). Are there some central thread were we can brainstorm?
>

I really like this blog post post http://stdout.be/2010/04/07/tags-dont-cut-it/



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Stanbol Enhancement Structure (discussion)

Posted by Stephane Gamard <st...@salsadev.com>.

On Mar 1, 2011, at 7:28 PM, Olivier Grisel wrote:

> I agree, I find the dc:type property confusing, I would rather have
> sb:entity-type property with a list of recommended toplevel types
> stating that domain specific types are also acceptable but won't
> probably be handled by the sample clients.
> 
> Also for the Tag vs Entity vs Keyword debate, I would add Topics (as
> in skos:Concept) to the debate: for instance a document could be about
> a topic such as "Business" or "Sport" and a subtopic such as
> "Football". Those are no Entities per se, and do not have
> "occurrences" in the content items: this is a global qualifier for the
> content of the document.
> 
> I this the "keyword" or "key term" or "keyphrase" enhancement type is
> still interesting to reflect the most important terms or phrases of a
> document (as extracted by SalsaDev for instance) without trying to
> perform the disambiguation step required to link those terms to a
> particular meaning using a public LOD URI (such as a dbpedia or
> wordnet link for instance). Such ambiguous keyterms can still be
> useful to perform a "related" websearch query on Yahoo BOSS for
> instance.

+1. As this is an ongoing internal debate here at SD (what really is a tag versus a keyword vs keyterm concept, ... ). Are there some central thread were we can brainstorm?

Re: Stanbol Enhancement Structure (discussion)

Posted by Olivier Grisel <ol...@ensta.org>.

2011/3/1 Enrico Daga <en...@gmail.com>:
>
>>> About annotations, I would remove a fixed list of entity type (Person,
>>> Organization, Location), since this is very related to the single
>>> engine and should be easily extend-able (or leave them but consider
>>> the possibility that some engine could extract "Fruit" without the
>>> need to change the ontology) .
>>
>> I do not agree with that. My counter arguments are
>>  - all NLP tools support this kind of entities
>>  - with only some types e.g. Person, Organizaiton, Location,
>> Activities one can cover a lot of detected Entities
>>  - for users that need to group detected entities it is much easier to
>> deal with a fixed list as to write there own clustering algorithm for
>> dealing with an extendable list. In my opinion the 4 above types +
>> others should be ok for most of the use cases.
>>  - If someone wants/needs to process the exact types of extracted
>> features he can anyway use the sb:entity-type. I accept especially
>> domain specific applications to have special support for Entities
>> using an type that is part of there domain ontology.
>>
>> Based on this "Banana" would end up in "other" and the type "Fruit"
>> would be available via the "sb:entity-type" property. An application
>> of a super market might however have an own "Fruit" category and the
>> Banana would show up there
>> To summarize my goal with the dc:type property is not to be flexible
>> nor semantically correct, but to make it easy for users to consume the
>> enhancements. The flexibility and extensibility is provided by
>> "sb:entity-type"
>>
>> Does that make sense to you?
> In principle not, but I agree that a fixed list is easier to adopt.
> My opinion is that the stanbol vocabulary should avoid domain-specific
> terms, and leaving them to the engines implementers.
> I also guess (but i cannot proof it ;) ) that if default provided
> engines uses some terms, then next to come engines will likely reuse
> those terms, to better support adoption. So i still do not see the
> need of a fixed set of entity types.

I agree, I find the dc:type property confusing, I would rather have
sb:entity-type property with a list of recommended toplevel types
stating that domain specific types are also acceptable but won't
probably be handled by the sample clients.

Also for the Tag vs Entity vs Keyword debate, I would add Topics (as
in skos:Concept) to the debate: for instance a document could be about
a topic such as "Business" or "Sport" and a subtopic such as
"Football". Those are no Entities per se, and do not have
"occurrences" in the content items: this is a global qualifier for the
content of the document.

I this the "keyword" or "key term" or "keyphrase" enhancement type is
still interesting to reflect the most important terms or phrases of a
document (as extracted by SalsaDev for instance) without trying to
perform the disambiguation step required to link those terms to a
particular meaning using a public LOD URI (such as a dbpedia or
wordnet link for instance). Such ambiguous keyterms can still be
useful to perform a "related" websearch query on Yahoo BOSS for
instance.

I still need could not find the time to read the full proposal.
Tomorrow I am working by a customer. I will give you more feedback on
thursday.

(Same constrains apply for the persistencestore code review...)

>>> In a future version it would be nice to find a way to let engines
>>> declare which is the contribution they are going to provide (tagging?
>>> categorization? metadata? embedded knowledge?) and how (adding
>>> annotation roles? entity types? metadata fields?)
>>>
>> Year that sounds like an interesting idea.
> I have opened a Jira issue (STANBOL-107), it would be nice to start
> with some example, as olivier suggested, but in this moment I have no
> idea :)

Ok, no hurry. Let's work on the enhancement structure first and we
will see the engines description later. It it is much less important
as the enhancement structure to the early adopters.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Re: Stanbol Enhancement Structure (discussion)

Posted by Enrico Daga <en...@gmail.com>.

On 1 March 2011 18:39, Rupert Westenthaler <rw...@apache.org> wrote:
> On Tue, Mar 1, 2011 at 5:23 PM, Enrico Daga <en...@gmail.com> wrote:
>> Hi Rupert, all,
>> first thank you for setup the proposal.
>> I have few considerations I want to share, at a first read.
>>
>> Abut annotation roles, between Tag/Keyword there is IMHO, no
>> difference, we should keep one.
>>
> That is something that needs to be discussed.
> When you look at "keywords" extracted e.g. by SalsaDev so one can
> clearly recognize a difference to suggested tags by other engines. I
> refer to the fact that keywords - in case of SlasaDev - usually refer
> to "normal" words that are somehow central within a document while
> Tags are usually much more related to some kind of Entities.
Maybe, but then I would not say Tag but Entity, tag is something that
reminds me to some label that I apply to classify the item...
ok... this needs to be discussed :)

>
>> Alongside Annotation and Enhancement, I would also consider to add
>> another concept: Embedded Knowledge, which should be fed into a
>> separate graph, this graph would then host any triples directly or
>> indirectly embedded in the content.
>>
>
> That was already discussed in Istanbul and it is very likely that we
> will implemented exactly like that, but such knowledge is not related
> to the enhancement structure and therefore not part of this
> specification.
It doesn't seem, if the specification wants to model the result of
enhancement engines I think that all kind of engine results should be
taken into account. Actually we would implement some RDFa extractor
exactly in the same way as the LocationEnhancementEngine. This risks
to be confusing.

>
>> About annotations, I would remove a fixed list of entity type (Person,
>> Organization, Location), since this is very related to the single
>> engine and should be easily extend-able (or leave them but consider
>> the possibility that some engine could extract "Fruit" without the
>> need to change the ontology) .
>
> I do not agree with that. My counter arguments are
>  - all NLP tools support this kind of entities
>  - with only some types e.g. Person, Organizaiton, Location,
> Activities one can cover a lot of detected Entities
>  - for users that need to group detected entities it is much easier to
> deal with a fixed list as to write there own clustering algorithm for
> dealing with an extendable list. In my opinion the 4 above types +
> others should be ok for most of the use cases.
>  - If someone wants/needs to process the exact types of extracted
> features he can anyway use the sb:entity-type. I accept especially
> domain specific applications to have special support for Entities
> using an type that is part of there domain ontology.
>
> Based on this "Banana" would end up in "other" and the type "Fruit"
> would be available via the "sb:entity-type" property. An application
> of a super market might however have an own "Fruit" category and the
> Banana would show up there
> To summarize my goal with the dc:type property is not to be flexible
> nor semantically correct, but to make it easy for users to consume the
> enhancements. The flexibility and extensibility is provided by
> "sb:entity-type"
>
> Does that make sense to you?
In principle not, but I agree that a fixed list is easier to adopt.
My opinion is that the stanbol vocabulary should avoid domain-specific
terms, and leaving them to the engines implementers.
I also guess (but i cannot proof it ;) ) that if default provided
engines uses some terms, then next to come engines will likely reuse
those terms, to better support adoption. So i still do not see the
need of a fixed set of entity types.

>
>>
>> In a future version it would be nice to find a way to let engines
>> declare which is the contribution they are going to provide (tagging?
>> categorization? metadata? embedded knowledge?) and how (adding
>> annotation roles? entity types? metadata fields?)
>>
> Year that sounds like an interesting idea.
I have opened a Jira issue (STANBOL-107), it would be nice to start
with some example, as olivier suggested, but in this moment I have no
idea :)

Enrico

>
> thx for the feedback
>
> best
> Rupert
>
>
> --
> | Rupert Westenthaler                            rwesten@apache.org
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>



-- 
Enrico Daga

--
http://www.enridaga.net
skype: enri-pan

Re: Stanbol Enhancement Structure (discussion)

Posted by Rupert Westenthaler <rw...@apache.org>.

On Tue, Mar 1, 2011 at 5:23 PM, Enrico Daga <en...@gmail.com> wrote:
> Hi Rupert, all,
> first thank you for setup the proposal.
> I have few considerations I want to share, at a first read.
>
> Abut annotation roles, between Tag/Keyword there is IMHO, no
> difference, we should keep one.
>
That is something that needs to be discussed.
When you look at "keywords" extracted e.g. by SalsaDev so one can
clearly recognize a difference to suggested tags by other engines. I
refer to the fact that keywords - in case of SlasaDev - usually refer
to "normal" words that are somehow central within a document while
Tags are usually much more related to some kind of Entities.

> Alongside Annotation and Enhancement, I would also consider to add
> another concept: Embedded Knowledge, which should be fed into a
> separate graph, this graph would then host any triples directly or
> indirectly embedded in the content.
>

That was already discussed in Istanbul and it is very likely that we
will implemented exactly like that, but such knowledge is not related
to the enhancement structure and therefore not part of this
specification.

> About annotations, I would remove a fixed list of entity type (Person,
> Organization, Location), since this is very related to the single
> engine and should be easily extend-able (or leave them but consider
> the possibility that some engine could extract "Fruit" without the
> need to change the ontology) .

I do not agree with that. My counter arguments are
 - all NLP tools support this kind of entities
 - with only some types e.g. Person, Organizaiton, Location,
Activities one can cover a lot of detected Entities
 - for users that need to group detected entities it is much easier to
deal with a fixed list as to write there own clustering algorithm for
dealing with an extendable list. In my opinion the 4 above types +
others should be ok for most of the use cases.
 - If someone wants/needs to process the exact types of extracted
features he can anyway use the sb:entity-type. I accept especially
domain specific applications to have special support for Entities
using an type that is part of there domain ontology.

Based on this "Banana" would end up in "other" and the type "Fruit"
would be available via the "sb:entity-type" property. An application
of a super market might however have an own "Fruit" category and the
Banana would show up there
To summarize my goal with the dc:type property is not to be flexible
nor semantically correct, but to make it easy for users to consume the
enhancements. The flexibility and extensibility is provided by
"sb:entity-type"

Does that make sense to you?

>
> In a future version it would be nice to find a way to let engines
> declare which is the contribution they are going to provide (tagging?
> categorization? metadata? embedded knowledge?) and how (adding
> annotation roles? entity types? metadata fields?)
>
Year that sounds like an interesting idea.

thx for the feedback

best
Rupert


-- 
| Rupert Westenthaler                            rwesten@apache.org
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Stanbol Enhancement Structure (discussion)

Posted by Olivier Grisel <ol...@ensta.org>.

2011/3/1 Enrico Daga <en...@gmail.com>:
>
> In a future version it would be nice to find a way to let engines
> declare which is the contribution they are going to provide (tagging?
> categorization? metadata? embedded knowledge?) and how (adding
> annotation roles? entity types? metadata fields?)

+1 but could be handled as separate task. Could you open a jira issue
for the linked data / semantic description of the engines? At least
stating the goals and maybe giving a sample turtle description of
existing engines to get the discussion started on concrete examples?

I will give you my comments on the rest of Rupert proposal in a
separate email (as soon as I finish reading it).

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Re: Stanbol Enhancement Structure (discussion)

Posted by Enrico Daga <en...@gmail.com>.

Hi Rupert, all,
first thank you for setup the proposal.
I have few considerations I want to share, at a first read.

Abut annotation roles, between Tag/Keyword there is IMHO, no
difference, we should keep one.

Alongside Annotation and Enhancement, I would also consider to add
another concept: Embedded Knowledge, which should be fed into a
separate graph, this graph would then host any triples directly or
indirectly embedded in the content.

About annotations, I would remove a fixed list of entity type (Person,
Organization, Location), since this is very related to the single
engine and should be easily extend-able (or leave them but consider
the possibility that some engine could extract "Fruit" without the
need to change the ontology) .

In a future version it would be nice to find a way to let engines
declare which is the contribution they are going to provide (tagging?
categorization? metadata? embedded knowledge?) and how (adding
annotation roles? entity types? metadata fields?)

My 2 cents

Enrico

On 1 March 2011 16:52, Bertrand Delacretaz <bd...@apache.org> wrote:
> On Tue, Mar 1, 2011 at 2:44 PM, Olivier Grisel <ol...@ensta.org> wrote:
>> 2011/3/1 Bertrand Delacretaz <bd...@apache.org>:
>>>...
>>> The navigation pages need to be improved (like
>>> http://stanbol.staging.apache.org/stanbol/docs/trunk/ ) but at least
>>> everything's accessible.
>>
>> The navigation links are broken when not clicking from the home
>> package. We probably need absolute links...
>
> Yes, sorry, I fixed that earlier today but forgot to publish the site,
> should be all good now.
>
> -Bertrand
>

-- 
Enrico Daga

--
http://www.enridaga.net
skype: enri-pan

Re: Stanbol Enhancement Structure (discussion)

Posted by Bertrand Delacretaz <bd...@apache.org>.

On Tue, Mar 1, 2011 at 2:44 PM, Olivier Grisel <ol...@ensta.org> wrote:
> 2011/3/1 Bertrand Delacretaz <bd...@apache.org>:
>>...
>> The navigation pages need to be improved (like
>> http://stanbol.staging.apache.org/stanbol/docs/trunk/ ) but at least
>> everything's accessible.
>
> The navigation links are broken when not clicking from the home
> package. We probably need absolute links...

Yes, sorry, I fixed that earlier today but forgot to publish the site,
should be all good now.

-Bertrand

Re: Stanbol Enhancement Structure (discussion)

Posted by Olivier Grisel <ol...@ensta.org>.

2011/3/1 Bertrand Delacretaz <bd...@apache.org>:
> On Tue, Mar 1, 2011 at 12:56 PM, Rupert Westenthaler <rw...@apache.org> wrote:
>> On Tue, Mar 1, 2011 at 9:58 AM, Bertrand Delacretaz
>> <bd...@apache.org> wrote:
>>>Can we publish it to http://incubator.apache.org/stanbol/ ? It will
>>> show up there anyway as soon as someone publishes the site for another
>>> reason.
>> For sure...
>
> Ok, it is published now at
> http://incubator.apache.org/stanbol/docs/trunk/enhancer/stanbolenhancementstructure.html
>
> The navigation pages need to be improved (like
> http://stanbol.staging.apache.org/stanbol/docs/trunk/ ) but at least
> everything's accessible.

The navigation links are broken when not clicking from the home
package. We probably need absolute links such as:

 href="/stanbol/team.html"

instead of the current:

 href="team.html"

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Re: Stanbol Enhancement Structure (discussion)

Posted by Bertrand Delacretaz <bd...@apache.org>.

On Tue, Mar 1, 2011 at 12:56 PM, Rupert Westenthaler <rw...@apache.org> wrote:
> On Tue, Mar 1, 2011 at 9:58 AM, Bertrand Delacretaz
> <bd...@apache.org> wrote:
>>Can we publish it to http://incubator.apache.org/stanbol/ ? It will
>> show up there anyway as soon as someone publishes the site for another
>> reason.
> For sure...

Ok, it is published now at
http://incubator.apache.org/stanbol/docs/trunk/enhancer/stanbolenhancementstructure.html

The navigation pages need to be improved (like
http://stanbol.staging.apache.org/stanbol/docs/trunk/ ) but at least
everything's accessible.

-Bertrand

Re: Stanbol Enhancement Structure (discussion)

Posted by Rupert Westenthaler <rw...@apache.org>.

On Tue, Mar 1, 2011 at 9:58 AM, Bertrand Delacretaz
<bd...@apache.org> wrote:
>Can we publish it to http://incubator.apache.org/stanbol/ ? It will
> show up there anyway as soon as someone publishes the site for another
> reason.
For sure. When I finally managed to get all the markdown syntax right
it was already
much to late and I was no longer in the mood to play around publishing.

On Tue, Mar 1, 2011 at 10:07 AM, Tommaso Teofili
<to...@gmail.com> wrote:
> Hi Rupert,
> I had a quick read of your proposal and I think it's good; the only thing I
> notice is that, if I understood it correctly, the Annotation object can be
> related to "something" not actually contained in the parsed content.
> So think for example to a Concept Annotation, then the concept is something
> abstract that can be "discovered" from the text of the content item but
> doesn't have any Occurrence in the parsed text so I wonder if Annotation is
> the proper name for that since Annotation makes me think to a span of text
> or data I can find in the parsed content. That (maybe) being a minor concern
> I like the proposal.

That is a valid point and I would have never thought about it, because
my thinking about that was that Annotations are extracted from the
ContentItem - the interpretation of the content - and not from the
Content - the data. I agree that Metadata are not part of the Content,
but they are for sure part of the interpretation and therefore there
was no problem based on my mind model.

When I look at the Whiteboard behind me it notes a concept with the
name "Metadata" but I had not liked it from the beginning and as
Andreas Gruber suggested to me that I could rather model it as an
annotation with an occurrence within the metadata I was really happy
to get rid of it.
Maybe it is time to re-introduce it - but in an different manner.

All the Concepts used within this specification are not intended to be
processed by the users. They are manly there to group useful sets of
properties (something like attribute groups in XSD or interfaces in
Java). The really important things are properties like
 - "dc:type": defining the type of the extracted feature
 - "dc:role": defining the role of the extracted feature and
 - "sb:entity": pointing to the definition of the extracted feature

And I think thats where the example "Enhancement of Metadata" got one
think wrong.
It is no good Idea to define the annotation referring to the creator
of the document as "sb:Tag".
I refer to
> <a1> dc:title "Richard Cypher"
> <a1> dc:role sb:Tag
> <a1> dc:type: dbpedia-ont:Person
A document should not be tagged with its creator until it is also
about the creator (e.g. in case of an CV).
This Annotation has a different role. It provides information needed
for the management. This should be reflected by the value of the
"dc:role" property.So maybe we should add a new role such as
"sb:Management".

However this is not true for all features extracted from metadata.
e.g. when extracting the Artist, Title and Album and Genre from ID3
metadata for an mp3 file it makes completely sense to tag this audio
file with all these values. It really depends on the meaning of the
field and EnhancementEngines specific to such standards should deal
with it.
Generally speaking, getting dc:role values right is not an easy task,
but I think this is OK because it will be one of the things that
distinguishes good from not so good enhancement engines. The important
thing is that the defined roles are clear and easy understandable by
users because they will use them to filter enhancement results.

best
Rupert Westenthaler

-- 
| Rupert Westenthaler                            rwesten@apache.org
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Stanbol Enhancement Structure (discussion)

Posted by Tommaso Teofili <to...@gmail.com>.

Hi Rupert,
I had a quick read of your proposal and I think it's good; the only thing I
notice is that, if I understood it correctly, the Annotation object can be
related to "something" not actually contained in the parsed content.
So think for example to a Concept Annotation, then the concept is something
abstract that can be "discovered" from the text of the content item but
doesn't have any Occurrence in the parsed text so I wonder if Annotation is
the proper name for that since Annotation makes me think to a span of text
or data I can find in the parsed content. That (maybe) being a minor concern
I like the proposal.
My 2 cents,
Tommaso

2011/2/28 Rupert Westenthaler <rw...@apache.org>

> Hi all,
>
> Today I committed a first proposal for the Stanbol Enhancement
> Structure that will replace the FISE Enhancement Structure currently
> used by the Stanbol Enhancer.
>
> The proposal can be found at
>
> http://stanbol.staging.apache.org/stanbol/docs/trunk/enhancer/stanbolenhancementstructure.html
>
> The source is located at
>
> http://svn.apache.org/repos/asf/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/stanbolenhancementstructure.mdtext
>
> Background:
>
> Currently the Stanbol Enhancer still uses the FISE Enhancement
> Structure. Changing this is unavoidable but will break all the current
> clients.
> Therefore the current plan is to keep using the current structure for
> some time and switch only to a new one as soon as we also implement
> new features that do require an extended Enhancement Structure (e.g.
> support for extracting metadata from parsed content)
>
> As discussed with ogrisel: The Issues STANBOL-12 and STANBOL-48 can
> and will be resolved by extending the current FISE Enhancement
> Structure (and therefore without breaking existing clients)
>
> Main Goals of this Proposal:
>
>  - start the discussion early and give peoples time to contribute
>  - inspire usage scenarios to catch as many requirements as possible
>  - propose solutions for shortcomings and missing features of the FISE
> Enhancement Structure
>
> As reminder:
>
> The biggest shortcoming of the current FISE Enhancement Structure was
> the complexity to consume (understand/parse/query) it on the client
> side. This can - to some extend - be improved by providing clients,
> but a good design of the Enhancement Structure will always be a
> central point for the ease of use of the Stanbol Enhancer component.
>
> I my opinion the easiness depends on a lot of things including
>  - human readable default serialisation (JSON-LD): A flat structure
> that uses less resources with a lot of properties would help with
> that. Having small pieces of information that link each other randomly
> distributed over the whole file is a disaster typically for many
> serialised RDF data and something we must aim to avoid.
>  - easy to read/write and modify (SPARQL) queries
>  - meaningful property and concept names
>  - usage of well known and understood metadata standards such as Dublin
> Core
>
> best
> Rupert Westenthaler
>
> --
> | Rupert Westenthaler                            rwesten@apache.org
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>