You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by Bertrand Delacretaz <bd...@apache.org> on 2012/05/09 01:06:58 UTC

[RT] SINR - a simplified client API for content enhancement

Hi,

I've been thinking recently that we could make Stanbol's content
enhancement services more accessible to the average developer by
providing a simplified POJO-like client API.

A secondary idea is to use that same API for other content enhancement
services, making it possible to combine them and/or make them
interchangeable.

This means losing the flexibility of RDF, but by using an Adapter
pattern we can remain sufficiently flexible while making it much
simpler to get started with Stanbol services.

The suggested name of this API is SINR (SINR Is Not RDF). Pronounced "sinner".

Here's an initial overview of what this could look like. Comments welcome.

Simple interfaces like Category, Annotation, Keyword are used to
represent content enhancements.

Here's Category, for example (credits to Reto for this one). Plain and simple:

interface Category {
  String getId();
  String getLabel();
  Category getParent();
}

To enhance content with categories and keywords, you call the
SinrEnhancer service like this:

InputStream content = ....
String mimeType = ...
// Specifying which enhancement types are desired allows the enhancer
// to avoid doing unnecessary work, while making it possible to define
// new types of enhancements later.
Class [] desiredEnhancements = new Class[] { Category.class, Keyowrds.class };
SinrResult r = enhancer.process(content, mimeType, desiredEnhancements);

An Adapter pattern allows you to convert the SINRresult to the various
data types:

List<Category> c = r.getResultsOfType(Category.class);
List<Keyowrd> k = r.getResultsOfType(Keywords.class);

With this pattern, new enhancement types can be added without changing
the SinrEnhancer interface.

We might create two SINR implementations, one that talks to an OSGi
service directly and another one that talks to a Stanbol server over
HTTP.

WDYT?
-Bertrand

Re: [RT] SINR - a simplified client API for content enhancement

Posted by Bertrand Delacretaz <bd...@apache.org>.
Hi Rupert,

On Wed, May 9, 2012 at 1:44 AM, Rupert Westenthaler
<ru...@gmail.com> wrote:
> ...I agree that consuming the Stanbol Enhancements is currently the most complex
> part for the average developer. But that is not only true for Java applies to any
> programming language....

Agreed - I think the SINR idea could apply to other languages as well,
if needed.

My immediate concern is Java, so I prefer starting with something
concrete here, but if people are willing to create similar
implementations for other languages in parallel that would be cool.

>
> ...Because of that I would really like to tackle this issue not via a Java APi, but rather on a
> level where all Stanbol users can TODO"profitieren"...

We can do both - as you say your format changes would help
implementing SINR, so they can go hand-in-hand IMO, but they are
different concerns.

-Bertrand

Re: [RT] SINR - a simplified client API for content enhancement

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi,

I agree that consuming the Stanbol Enhancements is currently the most complex part for the average developer. But that is not only true for Java applies to any programming language.

Because of that I would really like to tackle this issue not via a Java APi, but rather on a level where all Stanbol users can TODO"profitieren"

Here is how it could work ...

In the "RESTful Design Workshop" at the WWW2012 conference Markus Lanthaler gave a really nice presentation on JSON-LD [1]. While Apache Stanbol already uses JSON-LD[2] as default serialization the current usage is very basic - we just use JSON-LD as an other RDF-serialization format. However JSON-LD would allow for much more customization/control of the serialized JSON by using the "@context".

However only improving the RDF -> JSON mapping alone will not be sufficient to transform the raw enhancement results to an easy to comsume JSON structure. Because of that I suggest to extend the "@context" with additional properties that allow to define transformation rules that are applied to the Enhancement results before the actual serialization. In the examples of provided by this mail I will use LDpath [3], but one could also use Stanbol Rules similar to the RefactorEngine to achieve the same thing.

Lets use an example to describe the whole Idea.

Everything starts with a JSON-LD @context like the following (this tries to resemble the example in the original mail by Bertrand)

{
  "@context":
  {
    "enhancer": "http://fise.iks-project.eu/ontology/",
    "skos": "http://www.w3.org/2004/02/skos/core#,
    "label": "skos:prefLabel",
    "parent":  {
        "@id": "skos:broader",
        "@type": "@id"
    },
    "categories": {
        "@id":"enhancer:suggested-topics",
        "@ldpath": "enhancer:e
xtracted-from[rdf:type is enhancer:TopicEnhancement]/enhancer:referenced-entity",
        "@type":"skos:Concept",
        "@container":"@list",
    }
  }
}

This is a normal JSON-LD context as defined by [2] with a single exception the "@ldpath" property in the JSON object describing "categories".
This LD-path statement is used to "transform" the more complex modeling as used by the Stanbol Enhancement structure

    {content-item} -- extracted-from --> {topic-annotation-n} -- referenced-entity --> {category}

to the desired JSON structure

    {
        "@id" : "urn:content-item:SHA1-123456789"
        "categories" :  [{
            "@id" : "http://cv.iptc.org/newscodes/subjectcode/15002001"
            "label" : "downhill"
            "parent" : "http://cv.iptc.org/newscodes/subjectcode/15002000"
        }]
    }

For java users: Implementing SINR should be easily possible to use existing frameworks that supports mapping of Java objects to JSON (such as [4]).
However I expect that similar frameworks are also available for other programming languages and JavaScript will get native JSON objects.

A positive side-effect is that even Users that want to process RDF (instead of JSON) will profit from the easier RDF graphs produced by the @ldpath (or Stanbol Rule based) transformations as specified in the @contexts.

So while this will ensure that Enhancement results will be much easier to consume it also introduces a new weakness - the definition of the "@context" as this will be the new "most complex task for the average developer". 
However here the working assumption is that in most of the cases users will not need to define their own constexts, but rather use existing one that are included in the Stanbol distribution.

Such contexts would include:
    * the basic building blocks such as  categories (the above example), named entities (TextAnnotations), linked entities (EntityAnnotations), mentions (positions within the content), ...
    * combinations of such patterns tailored for typical use cases such as entity-tagging (e.g. [6]) or inline-annotations (e.g. [7])

All those predefined contexts need to be available via the stanbol web interface so that users can easily link/use them in requests (see [3] and especially the use of the Link header [5])

The Stanbol infrastructure for this feature would include

* available context should be also available via the OSGI environment (whiteboard pattern). The components performing the transformation will need that configuration
* the actual transformations based on the "@ldpath" instructions can be done by an EnhancementEngine in the post-processing phase.
* for serialization we will need to update the JSON-LD path serializer for Apache Clerezza to make it compliant with the recent changes/additions to the JSON-LD specification.
    * maybe we can use [8] as a base, but currently it does not define any license (added already an issue about that) 
* most of the LD-path utilities already exist (see the enhancer/ldpath module)
* Implementation of  SINR that directly accesses the Java-API could either be based on the transformed RDF graph.


WDYT?
Rupert



[1] http://www.slideshare.net/lanthaler/jsonld-for-restful-services
[2] http://json-ld.org/spec/latest/json-ld-syntax/
[3] http://json-ld.org/spec/latest/json-ld-syntax/#the-context
[4] http://xstream.codehaus.org/json-tutorial.html
[5] http://json-ld.org/spec/latest/json-ld-syntax/#referencing-contexts-from-json-documents
[6] http://www.youtube.com/watch?v=957-bs16Fjg
[7] http://hallojs.org/annotate.html (press the annotate button)
[8] https://github.com/tristan/jsonld-java


On 09.05.2012, at 01:06, Bertrand Delacretaz wrote:

> Hi,
> 
> I've been thinking recently that we could make Stanbol's content
> enhancement services more accessible to the average developer by
> providing a simplified POJO-like client API.
> 
> A secondary idea is to use that same API for other content enhancement
> services, making it possible to combine them and/or make them
> interchangeable.
> 
> This means losing the flexibility of RDF, but by using an Adapter
> pattern we can remain sufficiently flexible while making it much
> simpler to get started with Stanbol services.
> 
> The suggested name of this API is SINR (SINR Is Not RDF). Pronounced "sinner".
> 
> Here's an initial overview of what this could look like. Comments welcome.
> 
> Simple interfaces like Category, Annotation, Keyword are used to
> represent content enhancements.
> 
> Here's Category, for example (credits to Reto for this one). Plain and simple:
> 
> interface Category {
>  String getId();
>  String getLabel();
>  Category getParent();
> }
> 
> To enhance content with categories and keywords, you call the
> SinrEnhancer service like this:
> 
> InputStream content = ....
> String mimeType = ...
> // Specifying which enhancement types are desired allows the enhancer
> // to avoid doing unnecessary work, while making it possible to define
> // new types of enhancements later.
> Class [] desiredEnhancements = new Class[] { Category.class, Keyowrds.class };
> SinrResult r = enhancer.process(content, mimeType, desiredEnhancements);
> 
> An Adapter pattern allows you to convert the SINRresult to the various
> data types:
> 
> List<Category> c = r.getResultsOfType(Category.class);
> List<Keyowrd> k = r.getResultsOfType(Keywords.class);
> 
> With this pattern, new enhancement types can be added without changing
> the SinrEnhancer interface.
> 
> We might create two SINR implementations, one that talks to an OSGi
> service directly and another one that talks to a Stanbol server over
> HTTP.
> 
> WDYT?
> -Bertrand


Re: [RT] SINR - a simplified client API for content enhancement

Posted by Rupert Westenthaler <ru...@gmail.com>.
On Mon, Jun 4, 2012 at 12:30 PM, Pablo Mendes <pa...@gmail.com> wrote:
> I think this is where Rupert wanted JSON objects to be passed, so that no
> Java classes would need to be generated. Did I get it right?
>

I have a general interest for a possibility to provide a simple
possibility to allow users to configure how JSON documents should be
extracted from a RDF Graph. Where RDF Graph could also mean to collect
triples from different sources of the Linked Data Cloud. This would it
make so much easier for the typical Web developer to consume RDF data.

An my current assumption is that combining

* JSON LD contexts (http://json-ld.org/spec/latest/json-ld-syntax/#the-context)
* Graph traversal/transformation language (LDPath or a subset of SPARQL)

could be used for implementing those.

I think we will anyway meet at the IKS community workshop in Salzburg.
So maybe this would be a good opportunity to discuss this

best
Rupert

-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: [RT] SINR - a simplified client API for content enhancement

Posted by Pablo Mendes <pa...@gmail.com>.
Perhaps things like TextAnnotation, EntityAnnotation, Category, Language,
etc could have their own classes, while the domain specific stuff such as
Person, Location, Sports, etc could be represented by their URIs. If the
engine is returning properties alongside those URIs, perhaps a key-value
structure would be as far as SINR would go? Expressive access to the RDF
would still be accessible for the strong of heart, through another API.

What Fabian says still applies here, as if somebody now decides to add
another enhancement that is not TextAnnotation, EntityAnnotation, Category
or Language, then a new class would need to be added. I think this problem
is handled in UIMA by passing JCas [1] objects around (I am no UIMA expert)

I think this is where Rupert wanted JSON objects to be passed, so that no
Java classes would need to be generated. Did I get it right?

Another way would be to just keep it simple (for the consumer) and release
new SINRs every time a new type of enhancement engine is added.

Cheers,
Pablo

[1]
http://uima.apache.org/downloads/releaseDocs/2.1.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html#ugr.tug.aae.defining_types

On Mon, Jun 4, 2012 at 10:55 AM, Reto Bachmann-Gmür <re...@apache.org> wrote:

> On Wed, May 9, 2012 at 8:08 PM, Fabian Christ
> <ch...@googlemail.com>wrote:
>
> > Hi Bertrand,
> >
> > 2012/5/9 Bertrand Delacretaz <bd...@apache.org>:
> > > On Wed, May 9, 2012 at 12:39 AM, Fabian Christ
> > > <ch...@googlemail.com> wrote:
> > >> 2012/5/9 Bertrand Delacretaz <bd...@apache.org>:
> > >>>...
> > >>> interface Category {
> > >>>  String getId();
> > >>>  String getLabel();
> > >>>  Category getParent();
> > >>> }
> > >>
> > >> So for each type of enhancement you would have an interface - is this
> > >> the idea?...
> > >
> > > Yes, so that a user can say "I'd like to add Categories to this
> > > content" or "I'm looking for Entities in my content", etc.
> >
> > I see. I am wondering how often we would have to change the API with
> > this approach. The nice thing of having the data represented
> > independent of the API is that we can change the data schema without
> > changing the API. Now with API classes for categories, persons, etc.
> > wouldn't it mean that we have to change the API if someone wants to
> > add a new type of enhancement? But this would be contradicting to our
> > flexible enhancement architecture where we do not know what type of
> > enhancements future engines may produce. Any ideas?
> >
>
> I think Bertrand's examples basically draft a java api for SKOS. I think
> SKOS is likely to be a major usecase so it makes sense to provide
> convenient APIs for accessing this. I think the naming "is not rdf" however
> is a questionable marketing strategy. True is that as long as you are
> accessing the data via the api you don't have to deal or know about
> triples. But providing easy access at the price of expressivity doesn't
> require denying the (or one possible) underlying technology. So I'd rather
> focus on what it is: JSKOS, Categories4J, whatever....
>
> Cheers,
> Reto
>

Re: [RT] SINR - a simplified client API for content enhancement

Posted by Reto Bachmann-Gmür <re...@apache.org>.
On Wed, May 9, 2012 at 8:08 PM, Fabian Christ
<ch...@googlemail.com>wrote:

> Hi Bertrand,
>
> 2012/5/9 Bertrand Delacretaz <bd...@apache.org>:
> > On Wed, May 9, 2012 at 12:39 AM, Fabian Christ
> > <ch...@googlemail.com> wrote:
> >> 2012/5/9 Bertrand Delacretaz <bd...@apache.org>:
> >>>...
> >>> interface Category {
> >>>  String getId();
> >>>  String getLabel();
> >>>  Category getParent();
> >>> }
> >>
> >> So for each type of enhancement you would have an interface - is this
> >> the idea?...
> >
> > Yes, so that a user can say "I'd like to add Categories to this
> > content" or "I'm looking for Entities in my content", etc.
>
> I see. I am wondering how often we would have to change the API with
> this approach. The nice thing of having the data represented
> independent of the API is that we can change the data schema without
> changing the API. Now with API classes for categories, persons, etc.
> wouldn't it mean that we have to change the API if someone wants to
> add a new type of enhancement? But this would be contradicting to our
> flexible enhancement architecture where we do not know what type of
> enhancements future engines may produce. Any ideas?
>

I think Bertrand's examples basically draft a java api for SKOS. I think
SKOS is likely to be a major usecase so it makes sense to provide
convenient APIs for accessing this. I think the naming "is not rdf" however
is a questionable marketing strategy. True is that as long as you are
accessing the data via the api you don't have to deal or know about
triples. But providing easy access at the price of expressivity doesn't
require denying the (or one possible) underlying technology. So I'd rather
focus on what it is: JSKOS, Categories4J, whatever....

Cheers,
Reto

Re: [RT] SINR - a simplified client API for content enhancement

Posted by Fabian Christ <ch...@googlemail.com>.
Hi Bertrand,

2012/5/9 Bertrand Delacretaz <bd...@apache.org>:
> On Wed, May 9, 2012 at 12:39 AM, Fabian Christ
> <ch...@googlemail.com> wrote:
>> 2012/5/9 Bertrand Delacretaz <bd...@apache.org>:
>>>...
>>> interface Category {
>>>  String getId();
>>>  String getLabel();
>>>  Category getParent();
>>> }
>>
>> So for each type of enhancement you would have an interface - is this
>> the idea?...
>
> Yes, so that a user can say "I'd like to add Categories to this
> content" or "I'm looking for Entities in my content", etc.

I see. I am wondering how often we would have to change the API with
this approach. The nice thing of having the data represented
independent of the API is that we can change the data schema without
changing the API. Now with API classes for categories, persons, etc.
wouldn't it mean that we have to change the API if someone wants to
add a new type of enhancement? But this would be contradicting to our
flexible enhancement architecture where we do not know what type of
enhancements future engines may produce. Any ideas?

-- 
Fabian
http://twitter.com/fctwitt

Re: [RT] SINR - a simplified client API for content enhancement

Posted by Bertrand Delacretaz <bd...@apache.org>.
Hi Fabian,

On Wed, May 9, 2012 at 12:39 AM, Fabian Christ
<ch...@googlemail.com> wrote:
> 2012/5/9 Bertrand Delacretaz <bd...@apache.org>:
>>...
>> interface Category {
>>  String getId();
>>  String getLabel();
>>  Category getParent();
>> }
>
> So for each type of enhancement you would have an interface - is this
> the idea?...

Yes, so that a user can say "I'd like to add Categories to this
content" or "I'm looking for Entities in my content", etc.

> ...Can you explain how this corresponds to the enhancement
> structure defined ones at [1]? ...

I purposely didn't take that into account - my starting point is "what
API would make Stanbol and similar services really easy to use for
people who are not familiar with this stuff".

But now that you ask, let's try:

> ...What kind of information will become an
> interface - is it the dc:type?...

Looking at [1] the dc:types are for example "heading, section,
sentence, Language, Location, Person, Location, Entity".

I think I'm looking at something different - if I'm asking for Topic
(a SINR interface) for example, my question is "which Topics does my
content relate to, as a whole" and the answer might be "human rights,
finance, countries/USA".

If I'm asking for Entities, on the other hand, the question is "please
point me to known entities in my content" and the answer might be
"Barack Obama (from character index 12 to 23 and 146 to 150), USA
(from character index 27 to 30)".

So far, I haven't looked at how we would implement this...just trying
to get it right at the API level. I imagine we might need different
interpretations of the enhancement results to produce the above
answers.

-Bertrand

> [1] http://wiki.iks-project.eu/index.php/EnhancementStructure

Re: [RT] SINR - a simplified client API for content enhancement

Posted by Fabian Christ <ch...@googlemail.com>.
Hi,

thanks for starting such an initiative to make things more simple ;)

2012/5/9 Bertrand Delacretaz <bd...@apache.org>:
> Simple interfaces like Category, Annotation, Keyword are used to
> represent content enhancements.
>
> Here's Category, for example (credits to Reto for this one). Plain and simple:
>
> interface Category {
>  String getId();
>  String getLabel();
>  Category getParent();
> }

So for each type of enhancement you would have an interface - is this
the idea? Can you explain how this corresponds to the enhancement
structure defined ones at [1]? What kind of information will become an
interface - is it the dc:type?

[1] http://wiki.iks-project.eu/index.php/EnhancementStructure

-- 
Fabian
http://twitter.com/fctwitt