You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by Rupert Westenthaler <ru...@gmail.com> on 2013/01/11 07:31:59 UTC

Re: InsideOut10 contributed engines

Hi all,

This is an update on Freeling integration based on a discussion of
Fabian, David and myself from last week.

As mentioned earlier in this thread Freeling [1] is GPL licensed.
Because of that Apache Stanbol can not directly link to the Freeling
APIs. The best solution for this issue is to use a WebService to
access the Freeling functionality and this is exactly what we decided
to work on.

However as the License issues are not only something special to
Freeling, but also apply to other NLP frameworks one would like to
integrate with Apache Stanbol the decision was to opt for a more
generic approach. In the following I will provide information on the
system we are currently working on.

## JSON support for the Stanbol AnalyzedText ContentPart

STANBOL-878 [2] adds support for JSON parsing and serialization to the
AnaplyzedText ContentPart [3,4]. This will be the preferred format to
send Stanbol compatible NLP processing results over the wire. Both the
parser and the serializer are extensible. Meaning that users that want
to use special Annotations can also provide components that ensure
that those Annotation values are correctly serialized to/parsed from
JSON. In addition both parser and serializer are useable within and
outside an OSGI environment.

## RemoteNlpProcessing Engine

Stanbol will also provide a Default NLP processing  EnhancementEngine
that calls a remote RESTful service. The according JIRA issue will
follow soon.

This engine will support to send the plain text content part of the
processed ContentItem as POST request to the configured RESTful
endpoint.

The RESTful API will include two endpoints.

1. __Supported Languages__ : A simple GET request to
"{endpoint}/supported" that expects the supported languages as JSON
Array. This will be called during the activation of the Engine to
synchronies the language configuration of the Engine with the
supported languages of the NLP processing service. As an example if
the User configures the EnahncementEngine with "!en,!de, *" and the
NLP processing service reports "{languages: [en,es,it,pt]}" than the
combined configuration will be "es, it, pt".

2. __NLP processing__: A POST request providing the plain text and
expecting the JSON serialized AnalyzedText as response. The
"Content-Language" will be used to specify the language of the parsed
text. If the Language is unknown the header will be omitted.

Calls to the remote service will use an simple interface allowing
users to simple override the default implementation and adapt calls to
Servers using a different API.

## Freeling NLP processing Servier

This is the Server side component for Freeling. The implementation
will be based on the contributed FreelingEngine [5]. However
refactored to run as a standalone Server providing a RESTful API
compatible with the RemoteNlpProcessing Engine. As this code needs to
link to the GPL licensed Freeling API it will not become part of
Apache Stanbol but remain in a separate code repository.

## Summary

While originally implemented for the Freeling integration, the
intension of this infrastructure is to allow the integration of manny
more NLP processing frameworks. As both the JSON serialization for
AnalyzedText as well as the  RemoteNlpProcessing Engine will be part
of the default Stanbol distribution this will allow to integrate
external NLP processing frameworks by adding a simple Engine
configuration to Apache Stanbol.

best
Rupert Westenthaler


[1] http://nlp.lsi.upc.edu/freeling/
[2] https://issues.apache.org/jira/browse/STANBOL-878
[3] https://issues.apache.org/jira/browse/STANBOL-734
[4] http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
[5] https://github.com/insideout10/wordlift-stanbol/tree/master/freeling-engine

On Mon, Dec 17, 2012 at 3:42 PM, Rupert Westenthaler
<ru...@gmail.com> wrote:
> On Mon, Dec 17, 2012 at 2:40 PM, David Riccitelli <da...@insideout.io> wrote:
>> Hi Rupert,
>>
>> There is client/server mode,
>> http://nlp.lsi.upc.edu/freeling/doc/userman/html/node84.html.
>>
>
> thats looks like a socket connection but I have not seen any
> documentation about the messages one can send/receive.
>
>
>> But I was thinking pretty much what you said. From the point of view of
>> development, we could create a GPL Freeling web service outside of Stanbol,
>> and then have the APL engine query that service. Right?
>>
>
> If we can develop the Engines using the Socket connection we would not
> have any GPL dependencies. However we would need to generate requests
> / parse responses. If we go for the JNI API that I would propose that
> we develop the whole Engine outside of Stanbol. We can still release
> it under APL but because of the GPL dependencies we can not distribute
> it with Stanbol. However as soon as we add this to some maven repo
> users can simple refer it in their Stanbol launcher configurations.
>
> best
> Rupert
>
>
>> BR,
>> David
>>
>>
>> On Mon, Dec 17, 2012 at 3:37 PM, Rupert Westenthaler <
>> rupert.westenthaler@gmail.com> wrote:
>>
>>> Hi all
>>>
>>> @David: Is there also a RESTful API or some kind of other (Web)service
>>> provided by freeling. Maybe this would allow to bypass dependencies to
>>> GLP licensees. In any case we can develop the described engines on
>>> Github.
>>>
>>> best
>>> Rupert
>>>
>>> On Wed, Dec 12, 2012 at 1:09 PM, David Riccitelli <da...@insideout.io>
>>> wrote:
>>> > Thanks Rupert,
>>> >
>>> > I'll follow up on the EntitySearcher. Let me know if you need anything
>>> else
>>> > from my side.
>>> >
>>> > BR
>>> > David
>>> >
>>> > On Wed, Dec 12, 2012 at 1:46 PM, Rupert Westenthaler <
>>> > rupert.westenthaler@gmail.com> wrote:
>>> >
>>> >> EntitySearcher
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > David Riccitelli
>>> >
>>> >
>>> ********************************************************************************
>>> > InsideOut10 s.r.l.
>>> > P.IVA: IT-11381771002
>>> > Fax: +39 0110708239
>>> > ---
>>> > LinkedIn: http://it.linkedin.com/in/riccitelli
>>> > Twitter: ziodave
>>> > ---
>>> > Layar Partner Network<
>>> http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1
>>> >
>>> >
>>> ********************************************************************************
>>>
>>>
>>>
>>> --
>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>> | Bodenlehenstraße 11                             ++43-699-11108907
>>> | A-5500 Bischofshofen
>>>
>>
>>
>>
>> --
>> David Riccitelli
>>
>> ********************************************************************************
>> InsideOut10 s.r.l.
>> P.IVA: IT-11381771002
>> Fax: +39 0110708239
>> ---
>> LinkedIn: http://it.linkedin.com/in/riccitelli
>> Twitter: ziodave
>> ---
>> Layar Partner Network<http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1>
>> ********************************************************************************
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: InsideOut10 contributed engines

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi all,

An other update on the Stanbol Freeling integration especially on the

> ## Freeling NLP processing Servier
>
> This is the Server side component for Freeling. The implementation
> will be based on the contributed FreelingEngine [5]. However
> refactored to run as a standalone Server providing a RESTful API
> compatible with the RemoteNlpProcessing Engine. As this code needs to
> link to the GPL licensed Freeling API it will not become part of
> Apache Stanbol but remain in a separate code repository.

Starting form the Freeling Engine [5] I implemented such a Server. As
this server-side component links to GPL licensed code we will not be
able to include it in Apache Stanbol. Because of that it will be
hosted at Github in the stanbol-freeling [6] repository.

This project includes ATM tree modules

1. __freeling-core:__ Provides the
    * Freeling initialization based on the Freeling shared directory
and the config files
    * an resource pool for Analyzers and Language Identification to
allow concurrent processing of parsed texts
    * Mappings from the Tags used by Freeling to the Olia ontology
based Concepts defined by the Stanbol NLP processing module
    * conversion of the Freeling annotation structure to the Stanbol
AnalysedText ContentPart
2. __freeling-web:__ Provides the implementation of the JAX-RS
resources required for the RESTful API. Currently there are three
services
    * `POST -H "Content-Type: text/plain" /langident` returning a JSON
description of the detected languages.
    * `GET /analysis` returning the supported lanugages for the
Analysis endpoint as an JSON array
    * `POST -H "Content-Type: text/plain" -H "Content-Language"
/analysis ` returning the JSON serialized AnalysedText ContentPart
with the analysis results for the parsed text. Note that the
"Content-Language" header can (should) be used to explicitly parse the
language of the parsed text. If this header is missing, than the
Service will try to detect the language of the parsed text. The
response will also include the "Content-Language" header holding the
parsed or detected language.
3. __freeling-server:__ Provides a runable JAR that can be used to run
a Freeling RESTful endpoint based on
    * Jetty embedded Webserver
    * Aoache Wink as JAX-RS implementation
    * Freeling 3.0 that needs to be installed on the local machine
    * Freeling shared folder (configureable)
    * Freeling config folder (configureable). A default configuration
can be found in the freeling-config folder under [6]
    * Freeling native library (configureable). The native libs for Mac
and Linux can be found at [6] in the freeling-config

Currently you can test the server by using CURL requests like

    curl -i -X POST -H "Content-Type: text/plain" -T es.txt
http://localhost:8080/langident
    curl -i -X POST -H "Content-Type: text/plain" -H
"Content-Language: ru" -T ru.txt http://localhost:8080/analysis

but the Stanbol side EnhancementEngine implementations that will use
those serves will become available shortly (is my next task on my TODO
list)

best
Rupert

> [5] https://github.com/insideout10/wordlift-stanbol/tree/master/freeling-engine
[6] https://github.com/insideout10/stanbol-freeling

On Fri, Jan 11, 2013 at 7:31 AM, Rupert Westenthaler
<ru...@gmail.com> wrote:
> Hi all,
>
> This is an update on Freeling integration based on a discussion of
> Fabian, David and myself from last week.
>
> As mentioned earlier in this thread Freeling [1] is GPL licensed.
> Because of that Apache Stanbol can not directly link to the Freeling
> APIs. The best solution for this issue is to use a WebService to
> access the Freeling functionality and this is exactly what we decided
> to work on.
>
> However as the License issues are not only something special to
> Freeling, but also apply to other NLP frameworks one would like to
> integrate with Apache Stanbol the decision was to opt for a more
> generic approach. In the following I will provide information on the
> system we are currently working on.
>
> ## JSON support for the Stanbol AnalyzedText ContentPart
>
> STANBOL-878 [2] adds support for JSON parsing and serialization to the
> AnaplyzedText ContentPart [3,4]. This will be the preferred format to
> send Stanbol compatible NLP processing results over the wire. Both the
> parser and the serializer are extensible. Meaning that users that want
> to use special Annotations can also provide components that ensure
> that those Annotation values are correctly serialized to/parsed from
> JSON. In addition both parser and serializer are useable within and
> outside an OSGI environment.
>
> ## RemoteNlpProcessing Engine
>
> Stanbol will also provide a Default NLP processing  EnhancementEngine
> that calls a remote RESTful service. The according JIRA issue will
> follow soon.
>
> This engine will support to send the plain text content part of the
> processed ContentItem as POST request to the configured RESTful
> endpoint.
>
> The RESTful API will include two endpoints.
>
> 1. __Supported Languages__ : A simple GET request to
> "{endpoint}/supported" that expects the supported languages as JSON
> Array. This will be called during the activation of the Engine to
> synchronies the language configuration of the Engine with the
> supported languages of the NLP processing service. As an example if
> the User configures the EnahncementEngine with "!en,!de, *" and the
> NLP processing service reports "{languages: [en,es,it,pt]}" than the
> combined configuration will be "es, it, pt".
>
> 2. __NLP processing__: A POST request providing the plain text and
> expecting the JSON serialized AnalyzedText as response. The
> "Content-Language" will be used to specify the language of the parsed
> text. If the Language is unknown the header will be omitted.
>
> Calls to the remote service will use an simple interface allowing
> users to simple override the default implementation and adapt calls to
> Servers using a different API.
>
> ## Freeling NLP processing Servier
>
> This is the Server side component for Freeling. The implementation
> will be based on the contributed FreelingEngine [5]. However
> refactored to run as a standalone Server providing a RESTful API
> compatible with the RemoteNlpProcessing Engine. As this code needs to
> link to the GPL licensed Freeling API it will not become part of
> Apache Stanbol but remain in a separate code repository.
>
> ## Summary
>
> While originally implemented for the Freeling integration, the
> intension of this infrastructure is to allow the integration of manny
> more NLP processing frameworks. As both the JSON serialization for
> AnalyzedText as well as the  RemoteNlpProcessing Engine will be part
> of the default Stanbol distribution this will allow to integrate
> external NLP processing frameworks by adding a simple Engine
> configuration to Apache Stanbol.
>
> best
> Rupert Westenthaler
>
>
> [1] http://nlp.lsi.upc.edu/freeling/
> [2] https://issues.apache.org/jira/browse/STANBOL-878
> [3] https://issues.apache.org/jira/browse/STANBOL-734
> [4] http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
> [5] https://github.com/insideout10/wordlift-stanbol/tree/master/freeling-engine
>
> On Mon, Dec 17, 2012 at 3:42 PM, Rupert Westenthaler
> <ru...@gmail.com> wrote:
>> On Mon, Dec 17, 2012 at 2:40 PM, David Riccitelli <da...@insideout.io> wrote:
>>> Hi Rupert,
>>>
>>> There is client/server mode,
>>> http://nlp.lsi.upc.edu/freeling/doc/userman/html/node84.html.
>>>
>>
>> thats looks like a socket connection but I have not seen any
>> documentation about the messages one can send/receive.
>>
>>
>>> But I was thinking pretty much what you said. From the point of view of
>>> development, we could create a GPL Freeling web service outside of Stanbol,
>>> and then have the APL engine query that service. Right?
>>>
>>
>> If we can develop the Engines using the Socket connection we would not
>> have any GPL dependencies. However we would need to generate requests
>> / parse responses. If we go for the JNI API that I would propose that
>> we develop the whole Engine outside of Stanbol. We can still release
>> it under APL but because of the GPL dependencies we can not distribute
>> it with Stanbol. However as soon as we add this to some maven repo
>> users can simple refer it in their Stanbol launcher configurations.
>>
>> best
>> Rupert
>>
>>
>>> BR,
>>> David
>>>
>>>
>>> On Mon, Dec 17, 2012 at 3:37 PM, Rupert Westenthaler <
>>> rupert.westenthaler@gmail.com> wrote:
>>>
>>>> Hi all
>>>>
>>>> @David: Is there also a RESTful API or some kind of other (Web)service
>>>> provided by freeling. Maybe this would allow to bypass dependencies to
>>>> GLP licensees. In any case we can develop the described engines on
>>>> Github.
>>>>
>>>> best
>>>> Rupert
>>>>
>>>> On Wed, Dec 12, 2012 at 1:09 PM, David Riccitelli <da...@insideout.io>
>>>> wrote:
>>>> > Thanks Rupert,
>>>> >
>>>> > I'll follow up on the EntitySearcher. Let me know if you need anything
>>>> else
>>>> > from my side.
>>>> >
>>>> > BR
>>>> > David
>>>> >
>>>> > On Wed, Dec 12, 2012 at 1:46 PM, Rupert Westenthaler <
>>>> > rupert.westenthaler@gmail.com> wrote:
>>>> >
>>>> >> EntitySearcher
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > David Riccitelli
>>>> >
>>>> >
>>>> ********************************************************************************
>>>> > InsideOut10 s.r.l.
>>>> > P.IVA: IT-11381771002
>>>> > Fax: +39 0110708239
>>>> > ---
>>>> > LinkedIn: http://it.linkedin.com/in/riccitelli
>>>> > Twitter: ziodave
>>>> > ---
>>>> > Layar Partner Network<
>>>> http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1
>>>> >
>>>> >
>>>> ********************************************************************************
>>>>
>>>>
>>>>
>>>> --
>>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>>> | Bodenlehenstraße 11                             ++43-699-11108907
>>>> | A-5500 Bischofshofen
>>>>
>>>
>>>
>>>
>>> --
>>> David Riccitelli
>>>
>>> ********************************************************************************
>>> InsideOut10 s.r.l.
>>> P.IVA: IT-11381771002
>>> Fax: +39 0110708239
>>> ---
>>> LinkedIn: http://it.linkedin.com/in/riccitelli
>>> Twitter: ziodave
>>> ---
>>> Layar Partner Network<http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1>
>>> ********************************************************************************
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen



--
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen