You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by Rupert Westenthaler <ru...@gmail.com> on 2012/12/12 12:46:15 UTC

Re: InsideOut10 contributed engines

Hi all,

Just a short update on this.

## Freeling Integration

In the meantime I was writing 30+ mails with David to get freeling [3]
installed on my Mac. After fixing several issues with library
versions, build process and finally runtime problems I do now have a
working version on my Mac. I am currently making myself common to the
freeling API while checking how to best align the freeling framework
to the Stanbol NLP processing module.

* Freeling Language Identification: This would only add detection for
for "ca", "gl" to what is currently supported with the langdetect
engine. In addition the example for "sr" (Serbian) is classified as
"mk"(Macedonian).

* Freeling PoS Tagging: While the code performs all analysis available
currently only Nouns are extracted and used. To have a full
integration to the Stanbol NLP processing this needs to undergo
considerable refactoring and extensions.

Based on my current understanding I would:

* create a {trunk}/enhancement-engines/freeling folder with the
following modules
* freeling-definitions: Module that defines all the constants and
mapping. e.g. the mappings for the POS Tags used by Freeling to the
Olia Ontology used by the Stanbol NLP processing. The reason for
having this in an own module is that Freeling is GPL and this module
while representing an major part of the work will not need to depend
on the Freeling API. Therefore this will allow to release those things
under an Apache License.
* freeling-service: Initialises Freeling framework and registers the
Freeling Analysis services with the OSGI ServiceAdmin. This module
will allow for configuring Freeling (such as providing the path to the
Freeling configuration and the native libraries) and also allow other
modules to lookup Freeling functionality by using @Reference
annotations and/or OSGI ServiceTracker.
* freeling-langid: Same as the contributed engine
* freeling-analyse: Does all the analysis steps (token, sentence, pos,
chunk, ner, lemma ...) and stores the results to the AnalyzedText.
Splitting up the different analysis steps is not possible as the JNI
wrapper does not allow to manually construct elements (e.g. a WordList
needed as input for the sentence detector). However the
LanguageConfiguration utility part of stanbol.enhancer.nlp module will
be used to allow activating, deactivating analysis by default and/or
for specific languages (e.g. *;ner=false, en;ner=true) and the
AnalyzedText does already support merging of results provided by
different engines (e.g. if you add an Chunk that already exists the
existing Chunk will be returned and annotations of different Engines
will be merged.

### License considerations:

Freeling is licensed under GPLv3 a license that is NOT compatible with
Apache. This is also true for the java API (an JNI wrapper over the C
stuff) and all modules that depend on this API - basically the
freeling-** modules.

For Freeling itself this is not a big issue as Freeling needs to be
anyway downloaded, compiled and installed separately on the machine
running the Stanbol freeling engines. The main hurdle is the Java API
we need to link against. Because of that at least the
freeling-service, freeling-langid and freeling-analyse will not be
release-able by Apache Stanbol.

As I do not have any experience on how to deal with situations like
comments would be very welcome.



## TextAnnotations New Model

We need to define an Issue for switching to this model. My suggestion
is that we make the change immediately after the next 0.10.0 release.

### Some explanation about the new fise:TextAnnotation model

See also the documentation at the end of the fise:TextAnnotation section at [2]

This will change fise:TextAnnotations to adopt

* fise:selection-prefix: some words/characters before the selected section.
* fise:selection-suffix: some words/characters after the selected section.

In addition it will introduce

* fise:selection-head: the first few word/characters of a the selected
section within the text.
* fise:selection-tail: the last few words/characters of a selected
section. To be used together with fise:selection-head.

Those two properties are alternatives to fise:selected-text and
intended to be used in cases where EnhancementEngines what to select
whole sentences, paragraphs or even sections of the text. The main
intension is to avoid to repeat long parts of the context as a literal
in the RDF graph. Word and Phrase level annotations will not be
affected by this.

The fise:selection-context will still be supported but its semantic
will be changed to describing those part of the content that was used
as a context for the annotation. Its use for identifying the correct
location of the annotation within the text will be discouraged after
this change.

### Contributed Engine

I strongly suggest to accept this engine as it provides a good
solution in case users want to use Engines that still use the current
model with client side code that is written for the new Model.



## Freebase Entity Recognition

I had not yet time to look into this in more detail. One thing I would
like to check if it is feasible to implement the EntitySearcher [5]
used by the new EntityLinkingEngine because if this would be the case
the integration should be really strait forward. In addition future
enhancements to the EntityLinking process would automatically also
apply to this engine.

David could you have a look at the EntitySearcher [5] interface. You
can also use the EntityhubLinkingEngine (docs: [6], source [7]) as an
example on how to use the generic EntityLinkingEngine with a specific
EntitySearcher implementation.




## Schema.org Refactorer

Had no time to look at this



# Next Steps

* I plan to create JIRA issues for the tasks as described above. I
will make them as replacing STANBOL-807 [4] Sorry I need to create new
issues as STANBOL-807 is to broad in scope.
* AFAIK we do need code contributions uploaded as archives to JIRA. So
if nobody replays to this claiming otherwise I will ask David to
formally contribute modules to those Issues.
* Resolve license issues with the GPL licensed Freeling

I plan to work on the Freeling stuff first. Mainly because I expect
those work as a very welcome opportunity to validate the Stanbol NLP
processing API.


Thanks for contributing this to the Stanbol Community

best
Rupert

On Thu, Nov 15, 2012 at 8:20 AM, David Riccitelli <da...@insideout.io> wrote:

>  [2] http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetextannotation
>  [3] http://nlp.lsi.upc.edu/freeling/
>  [4] https://issues.apache.org/jira/browse/STANBOL-807

[5] http://stanbol.apache.org/docs/trunk/components/enhancer/engines/entitylinking#entitysearcher
[6] http://stanbol.apache.org/docs/trunk/components/enhancer/engines/entityhublinking
[7] http://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/entityhublinking/src/main/java/org/apache/stanbol/enhancer/engines/entityhublinking/EntityhubLinkingEngine.java
--
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: InsideOut10 contributed engines

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi all,

An other update on the Stanbol Freeling integration especially on the

> ## Freeling NLP processing Servier
>
> This is the Server side component for Freeling. The implementation
> will be based on the contributed FreelingEngine [5]. However
> refactored to run as a standalone Server providing a RESTful API
> compatible with the RemoteNlpProcessing Engine. As this code needs to
> link to the GPL licensed Freeling API it will not become part of
> Apache Stanbol but remain in a separate code repository.

Starting form the Freeling Engine [5] I implemented such a Server. As
this server-side component links to GPL licensed code we will not be
able to include it in Apache Stanbol. Because of that it will be
hosted at Github in the stanbol-freeling [6] repository.

This project includes ATM tree modules

1. __freeling-core:__ Provides the
    * Freeling initialization based on the Freeling shared directory
and the config files
    * an resource pool for Analyzers and Language Identification to
allow concurrent processing of parsed texts
    * Mappings from the Tags used by Freeling to the Olia ontology
based Concepts defined by the Stanbol NLP processing module
    * conversion of the Freeling annotation structure to the Stanbol
AnalysedText ContentPart
2. __freeling-web:__ Provides the implementation of the JAX-RS
resources required for the RESTful API. Currently there are three
services
    * `POST -H "Content-Type: text/plain" /langident` returning a JSON
description of the detected languages.
    * `GET /analysis` returning the supported lanugages for the
Analysis endpoint as an JSON array
    * `POST -H "Content-Type: text/plain" -H "Content-Language"
/analysis ` returning the JSON serialized AnalysedText ContentPart
with the analysis results for the parsed text. Note that the
"Content-Language" header can (should) be used to explicitly parse the
language of the parsed text. If this header is missing, than the
Service will try to detect the language of the parsed text. The
response will also include the "Content-Language" header holding the
parsed or detected language.
3. __freeling-server:__ Provides a runable JAR that can be used to run
a Freeling RESTful endpoint based on
    * Jetty embedded Webserver
    * Aoache Wink as JAX-RS implementation
    * Freeling 3.0 that needs to be installed on the local machine
    * Freeling shared folder (configureable)
    * Freeling config folder (configureable). A default configuration
can be found in the freeling-config folder under [6]
    * Freeling native library (configureable). The native libs for Mac
and Linux can be found at [6] in the freeling-config

Currently you can test the server by using CURL requests like

    curl -i -X POST -H "Content-Type: text/plain" -T es.txt
http://localhost:8080/langident
    curl -i -X POST -H "Content-Type: text/plain" -H
"Content-Language: ru" -T ru.txt http://localhost:8080/analysis

but the Stanbol side EnhancementEngine implementations that will use
those serves will become available shortly (is my next task on my TODO
list)

best
Rupert

> [5] https://github.com/insideout10/wordlift-stanbol/tree/master/freeling-engine
[6] https://github.com/insideout10/stanbol-freeling

On Fri, Jan 11, 2013 at 7:31 AM, Rupert Westenthaler
<ru...@gmail.com> wrote:
> Hi all,
>
> This is an update on Freeling integration based on a discussion of
> Fabian, David and myself from last week.
>
> As mentioned earlier in this thread Freeling [1] is GPL licensed.
> Because of that Apache Stanbol can not directly link to the Freeling
> APIs. The best solution for this issue is to use a WebService to
> access the Freeling functionality and this is exactly what we decided
> to work on.
>
> However as the License issues are not only something special to
> Freeling, but also apply to other NLP frameworks one would like to
> integrate with Apache Stanbol the decision was to opt for a more
> generic approach. In the following I will provide information on the
> system we are currently working on.
>
> ## JSON support for the Stanbol AnalyzedText ContentPart
>
> STANBOL-878 [2] adds support for JSON parsing and serialization to the
> AnaplyzedText ContentPart [3,4]. This will be the preferred format to
> send Stanbol compatible NLP processing results over the wire. Both the
> parser and the serializer are extensible. Meaning that users that want
> to use special Annotations can also provide components that ensure
> that those Annotation values are correctly serialized to/parsed from
> JSON. In addition both parser and serializer are useable within and
> outside an OSGI environment.
>
> ## RemoteNlpProcessing Engine
>
> Stanbol will also provide a Default NLP processing  EnhancementEngine
> that calls a remote RESTful service. The according JIRA issue will
> follow soon.
>
> This engine will support to send the plain text content part of the
> processed ContentItem as POST request to the configured RESTful
> endpoint.
>
> The RESTful API will include two endpoints.
>
> 1. __Supported Languages__ : A simple GET request to
> "{endpoint}/supported" that expects the supported languages as JSON
> Array. This will be called during the activation of the Engine to
> synchronies the language configuration of the Engine with the
> supported languages of the NLP processing service. As an example if
> the User configures the EnahncementEngine with "!en,!de, *" and the
> NLP processing service reports "{languages: [en,es,it,pt]}" than the
> combined configuration will be "es, it, pt".
>
> 2. __NLP processing__: A POST request providing the plain text and
> expecting the JSON serialized AnalyzedText as response. The
> "Content-Language" will be used to specify the language of the parsed
> text. If the Language is unknown the header will be omitted.
>
> Calls to the remote service will use an simple interface allowing
> users to simple override the default implementation and adapt calls to
> Servers using a different API.
>
> ## Freeling NLP processing Servier
>
> This is the Server side component for Freeling. The implementation
> will be based on the contributed FreelingEngine [5]. However
> refactored to run as a standalone Server providing a RESTful API
> compatible with the RemoteNlpProcessing Engine. As this code needs to
> link to the GPL licensed Freeling API it will not become part of
> Apache Stanbol but remain in a separate code repository.
>
> ## Summary
>
> While originally implemented for the Freeling integration, the
> intension of this infrastructure is to allow the integration of manny
> more NLP processing frameworks. As both the JSON serialization for
> AnalyzedText as well as the  RemoteNlpProcessing Engine will be part
> of the default Stanbol distribution this will allow to integrate
> external NLP processing frameworks by adding a simple Engine
> configuration to Apache Stanbol.
>
> best
> Rupert Westenthaler
>
>
> [1] http://nlp.lsi.upc.edu/freeling/
> [2] https://issues.apache.org/jira/browse/STANBOL-878
> [3] https://issues.apache.org/jira/browse/STANBOL-734
> [4] http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
> [5] https://github.com/insideout10/wordlift-stanbol/tree/master/freeling-engine
>
> On Mon, Dec 17, 2012 at 3:42 PM, Rupert Westenthaler
> <ru...@gmail.com> wrote:
>> On Mon, Dec 17, 2012 at 2:40 PM, David Riccitelli <da...@insideout.io> wrote:
>>> Hi Rupert,
>>>
>>> There is client/server mode,
>>> http://nlp.lsi.upc.edu/freeling/doc/userman/html/node84.html.
>>>
>>
>> thats looks like a socket connection but I have not seen any
>> documentation about the messages one can send/receive.
>>
>>
>>> But I was thinking pretty much what you said. From the point of view of
>>> development, we could create a GPL Freeling web service outside of Stanbol,
>>> and then have the APL engine query that service. Right?
>>>
>>
>> If we can develop the Engines using the Socket connection we would not
>> have any GPL dependencies. However we would need to generate requests
>> / parse responses. If we go for the JNI API that I would propose that
>> we develop the whole Engine outside of Stanbol. We can still release
>> it under APL but because of the GPL dependencies we can not distribute
>> it with Stanbol. However as soon as we add this to some maven repo
>> users can simple refer it in their Stanbol launcher configurations.
>>
>> best
>> Rupert
>>
>>
>>> BR,
>>> David
>>>
>>>
>>> On Mon, Dec 17, 2012 at 3:37 PM, Rupert Westenthaler <
>>> rupert.westenthaler@gmail.com> wrote:
>>>
>>>> Hi all
>>>>
>>>> @David: Is there also a RESTful API or some kind of other (Web)service
>>>> provided by freeling. Maybe this would allow to bypass dependencies to
>>>> GLP licensees. In any case we can develop the described engines on
>>>> Github.
>>>>
>>>> best
>>>> Rupert
>>>>
>>>> On Wed, Dec 12, 2012 at 1:09 PM, David Riccitelli <da...@insideout.io>
>>>> wrote:
>>>> > Thanks Rupert,
>>>> >
>>>> > I'll follow up on the EntitySearcher. Let me know if you need anything
>>>> else
>>>> > from my side.
>>>> >
>>>> > BR
>>>> > David
>>>> >
>>>> > On Wed, Dec 12, 2012 at 1:46 PM, Rupert Westenthaler <
>>>> > rupert.westenthaler@gmail.com> wrote:
>>>> >
>>>> >> EntitySearcher
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > David Riccitelli
>>>> >
>>>> >
>>>> ********************************************************************************
>>>> > InsideOut10 s.r.l.
>>>> > P.IVA: IT-11381771002
>>>> > Fax: +39 0110708239
>>>> > ---
>>>> > LinkedIn: http://it.linkedin.com/in/riccitelli
>>>> > Twitter: ziodave
>>>> > ---
>>>> > Layar Partner Network<
>>>> http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1
>>>> >
>>>> >
>>>> ********************************************************************************
>>>>
>>>>
>>>>
>>>> --
>>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>>> | Bodenlehenstraße 11                             ++43-699-11108907
>>>> | A-5500 Bischofshofen
>>>>
>>>
>>>
>>>
>>> --
>>> David Riccitelli
>>>
>>> ********************************************************************************
>>> InsideOut10 s.r.l.
>>> P.IVA: IT-11381771002
>>> Fax: +39 0110708239
>>> ---
>>> LinkedIn: http://it.linkedin.com/in/riccitelli
>>> Twitter: ziodave
>>> ---
>>> Layar Partner Network<http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1>
>>> ********************************************************************************
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen



--
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: InsideOut10 contributed engines

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi all,

This is an update on Freeling integration based on a discussion of
Fabian, David and myself from last week.

As mentioned earlier in this thread Freeling [1] is GPL licensed.
Because of that Apache Stanbol can not directly link to the Freeling
APIs. The best solution for this issue is to use a WebService to
access the Freeling functionality and this is exactly what we decided
to work on.

However as the License issues are not only something special to
Freeling, but also apply to other NLP frameworks one would like to
integrate with Apache Stanbol the decision was to opt for a more
generic approach. In the following I will provide information on the
system we are currently working on.

## JSON support for the Stanbol AnalyzedText ContentPart

STANBOL-878 [2] adds support for JSON parsing and serialization to the
AnaplyzedText ContentPart [3,4]. This will be the preferred format to
send Stanbol compatible NLP processing results over the wire. Both the
parser and the serializer are extensible. Meaning that users that want
to use special Annotations can also provide components that ensure
that those Annotation values are correctly serialized to/parsed from
JSON. In addition both parser and serializer are useable within and
outside an OSGI environment.

## RemoteNlpProcessing Engine

Stanbol will also provide a Default NLP processing  EnhancementEngine
that calls a remote RESTful service. The according JIRA issue will
follow soon.

This engine will support to send the plain text content part of the
processed ContentItem as POST request to the configured RESTful
endpoint.

The RESTful API will include two endpoints.

1. __Supported Languages__ : A simple GET request to
"{endpoint}/supported" that expects the supported languages as JSON
Array. This will be called during the activation of the Engine to
synchronies the language configuration of the Engine with the
supported languages of the NLP processing service. As an example if
the User configures the EnahncementEngine with "!en,!de, *" and the
NLP processing service reports "{languages: [en,es,it,pt]}" than the
combined configuration will be "es, it, pt".

2. __NLP processing__: A POST request providing the plain text and
expecting the JSON serialized AnalyzedText as response. The
"Content-Language" will be used to specify the language of the parsed
text. If the Language is unknown the header will be omitted.

Calls to the remote service will use an simple interface allowing
users to simple override the default implementation and adapt calls to
Servers using a different API.

## Freeling NLP processing Servier

This is the Server side component for Freeling. The implementation
will be based on the contributed FreelingEngine [5]. However
refactored to run as a standalone Server providing a RESTful API
compatible with the RemoteNlpProcessing Engine. As this code needs to
link to the GPL licensed Freeling API it will not become part of
Apache Stanbol but remain in a separate code repository.

## Summary

While originally implemented for the Freeling integration, the
intension of this infrastructure is to allow the integration of manny
more NLP processing frameworks. As both the JSON serialization for
AnalyzedText as well as the  RemoteNlpProcessing Engine will be part
of the default Stanbol distribution this will allow to integrate
external NLP processing frameworks by adding a simple Engine
configuration to Apache Stanbol.

best
Rupert Westenthaler


[1] http://nlp.lsi.upc.edu/freeling/
[2] https://issues.apache.org/jira/browse/STANBOL-878
[3] https://issues.apache.org/jira/browse/STANBOL-734
[4] http://stanbol.apache.org/docs/trunk/components/enhancer/nlp/analyzedtext
[5] https://github.com/insideout10/wordlift-stanbol/tree/master/freeling-engine

On Mon, Dec 17, 2012 at 3:42 PM, Rupert Westenthaler
<ru...@gmail.com> wrote:
> On Mon, Dec 17, 2012 at 2:40 PM, David Riccitelli <da...@insideout.io> wrote:
>> Hi Rupert,
>>
>> There is client/server mode,
>> http://nlp.lsi.upc.edu/freeling/doc/userman/html/node84.html.
>>
>
> thats looks like a socket connection but I have not seen any
> documentation about the messages one can send/receive.
>
>
>> But I was thinking pretty much what you said. From the point of view of
>> development, we could create a GPL Freeling web service outside of Stanbol,
>> and then have the APL engine query that service. Right?
>>
>
> If we can develop the Engines using the Socket connection we would not
> have any GPL dependencies. However we would need to generate requests
> / parse responses. If we go for the JNI API that I would propose that
> we develop the whole Engine outside of Stanbol. We can still release
> it under APL but because of the GPL dependencies we can not distribute
> it with Stanbol. However as soon as we add this to some maven repo
> users can simple refer it in their Stanbol launcher configurations.
>
> best
> Rupert
>
>
>> BR,
>> David
>>
>>
>> On Mon, Dec 17, 2012 at 3:37 PM, Rupert Westenthaler <
>> rupert.westenthaler@gmail.com> wrote:
>>
>>> Hi all
>>>
>>> @David: Is there also a RESTful API or some kind of other (Web)service
>>> provided by freeling. Maybe this would allow to bypass dependencies to
>>> GLP licensees. In any case we can develop the described engines on
>>> Github.
>>>
>>> best
>>> Rupert
>>>
>>> On Wed, Dec 12, 2012 at 1:09 PM, David Riccitelli <da...@insideout.io>
>>> wrote:
>>> > Thanks Rupert,
>>> >
>>> > I'll follow up on the EntitySearcher. Let me know if you need anything
>>> else
>>> > from my side.
>>> >
>>> > BR
>>> > David
>>> >
>>> > On Wed, Dec 12, 2012 at 1:46 PM, Rupert Westenthaler <
>>> > rupert.westenthaler@gmail.com> wrote:
>>> >
>>> >> EntitySearcher
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > David Riccitelli
>>> >
>>> >
>>> ********************************************************************************
>>> > InsideOut10 s.r.l.
>>> > P.IVA: IT-11381771002
>>> > Fax: +39 0110708239
>>> > ---
>>> > LinkedIn: http://it.linkedin.com/in/riccitelli
>>> > Twitter: ziodave
>>> > ---
>>> > Layar Partner Network<
>>> http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1
>>> >
>>> >
>>> ********************************************************************************
>>>
>>>
>>>
>>> --
>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>> | Bodenlehenstraße 11                             ++43-699-11108907
>>> | A-5500 Bischofshofen
>>>
>>
>>
>>
>> --
>> David Riccitelli
>>
>> ********************************************************************************
>> InsideOut10 s.r.l.
>> P.IVA: IT-11381771002
>> Fax: +39 0110708239
>> ---
>> LinkedIn: http://it.linkedin.com/in/riccitelli
>> Twitter: ziodave
>> ---
>> Layar Partner Network<http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1>
>> ********************************************************************************
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: InsideOut10 contributed engines

Posted by Rupert Westenthaler <ru...@gmail.com>.
On Mon, Dec 17, 2012 at 2:40 PM, David Riccitelli <da...@insideout.io> wrote:
> Hi Rupert,
>
> There is client/server mode,
> http://nlp.lsi.upc.edu/freeling/doc/userman/html/node84.html.
>

thats looks like a socket connection but I have not seen any
documentation about the messages one can send/receive.


> But I was thinking pretty much what you said. From the point of view of
> development, we could create a GPL Freeling web service outside of Stanbol,
> and then have the APL engine query that service. Right?
>

If we can develop the Engines using the Socket connection we would not
have any GPL dependencies. However we would need to generate requests
/ parse responses. If we go for the JNI API that I would propose that
we develop the whole Engine outside of Stanbol. We can still release
it under APL but because of the GPL dependencies we can not distribute
it with Stanbol. However as soon as we add this to some maven repo
users can simple refer it in their Stanbol launcher configurations.

best
Rupert


> BR,
> David
>
>
> On Mon, Dec 17, 2012 at 3:37 PM, Rupert Westenthaler <
> rupert.westenthaler@gmail.com> wrote:
>
>> Hi all
>>
>> @David: Is there also a RESTful API or some kind of other (Web)service
>> provided by freeling. Maybe this would allow to bypass dependencies to
>> GLP licensees. In any case we can develop the described engines on
>> Github.
>>
>> best
>> Rupert
>>
>> On Wed, Dec 12, 2012 at 1:09 PM, David Riccitelli <da...@insideout.io>
>> wrote:
>> > Thanks Rupert,
>> >
>> > I'll follow up on the EntitySearcher. Let me know if you need anything
>> else
>> > from my side.
>> >
>> > BR
>> > David
>> >
>> > On Wed, Dec 12, 2012 at 1:46 PM, Rupert Westenthaler <
>> > rupert.westenthaler@gmail.com> wrote:
>> >
>> >> EntitySearcher
>> >
>> >
>> >
>> >
>> > --
>> > David Riccitelli
>> >
>> >
>> ********************************************************************************
>> > InsideOut10 s.r.l.
>> > P.IVA: IT-11381771002
>> > Fax: +39 0110708239
>> > ---
>> > LinkedIn: http://it.linkedin.com/in/riccitelli
>> > Twitter: ziodave
>> > ---
>> > Layar Partner Network<
>> http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1
>> >
>> >
>> ********************************************************************************
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>
>
>
>
> --
> David Riccitelli
>
> ********************************************************************************
> InsideOut10 s.r.l.
> P.IVA: IT-11381771002
> Fax: +39 0110708239
> ---
> LinkedIn: http://it.linkedin.com/in/riccitelli
> Twitter: ziodave
> ---
> Layar Partner Network<http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1>
> ********************************************************************************



--
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: InsideOut10 contributed engines

Posted by David Riccitelli <da...@insideout.io>.
Hi Rupert,

There is client/server mode,
http://nlp.lsi.upc.edu/freeling/doc/userman/html/node84.html.

But I was thinking pretty much what you said. From the point of view of
development, we could create a GPL Freeling web service outside of Stanbol,
and then have the APL engine query that service. Right?

BR,
David


On Mon, Dec 17, 2012 at 3:37 PM, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> Hi all
>
> @David: Is there also a RESTful API or some kind of other (Web)service
> provided by freeling. Maybe this would allow to bypass dependencies to
> GLP licensees. In any case we can develop the described engines on
> Github.
>
> best
> Rupert
>
> On Wed, Dec 12, 2012 at 1:09 PM, David Riccitelli <da...@insideout.io>
> wrote:
> > Thanks Rupert,
> >
> > I'll follow up on the EntitySearcher. Let me know if you need anything
> else
> > from my side.
> >
> > BR
> > David
> >
> > On Wed, Dec 12, 2012 at 1:46 PM, Rupert Westenthaler <
> > rupert.westenthaler@gmail.com> wrote:
> >
> >> EntitySearcher
> >
> >
> >
> >
> > --
> > David Riccitelli
> >
> >
> ********************************************************************************
> > InsideOut10 s.r.l.
> > P.IVA: IT-11381771002
> > Fax: +39 0110708239
> > ---
> > LinkedIn: http://it.linkedin.com/in/riccitelli
> > Twitter: ziodave
> > ---
> > Layar Partner Network<
> http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1
> >
> >
> ********************************************************************************
>
>
>
> --
> | Rupert Westenthaler             rupert.westenthaler@gmail.com
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>



-- 
David Riccitelli

********************************************************************************
InsideOut10 s.r.l.
P.IVA: IT-11381771002
Fax: +39 0110708239
---
LinkedIn: http://it.linkedin.com/in/riccitelli
Twitter: ziodave
---
Layar Partner Network<http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1>
********************************************************************************

Re: InsideOut10 contributed engines

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi all

@David: Is there also a RESTful API or some kind of other (Web)service
provided by freeling. Maybe this would allow to bypass dependencies to
GLP licensees. In any case we can develop the described engines on
Github.

best
Rupert

On Wed, Dec 12, 2012 at 1:09 PM, David Riccitelli <da...@insideout.io> wrote:
> Thanks Rupert,
>
> I'll follow up on the EntitySearcher. Let me know if you need anything else
> from my side.
>
> BR
> David
>
> On Wed, Dec 12, 2012 at 1:46 PM, Rupert Westenthaler <
> rupert.westenthaler@gmail.com> wrote:
>
>> EntitySearcher
>
>
>
>
> --
> David Riccitelli
>
> ********************************************************************************
> InsideOut10 s.r.l.
> P.IVA: IT-11381771002
> Fax: +39 0110708239
> ---
> LinkedIn: http://it.linkedin.com/in/riccitelli
> Twitter: ziodave
> ---
> Layar Partner Network<http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1>
> ********************************************************************************



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: InsideOut10 contributed engines

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi David,

yes

On Mon, Dec 17, 2012 at 3:21 PM, David Riccitelli <da...@insideout.io> wrote:
> As far as I can see the EntitySearcher provides representations of Entities
> given their IDs. Correct?
>
> What the Freebase Entity Search engine does is:
>  1. take *nouns* (from the PoS Tagging, or similar)

You will not need to deal with this. The EntitySearcher is only called
with "Nouns". In detail the "List<String> search" parameter of

    lookup(UriRef field, Set<UriRef> selectedFields, List<String>
search, String[] languages, Integer limit)

contains the nouns you need to look for. You should use a OR query
over all elements in the List. The "String[] languages" contains the
languages you you should search in.


>  2. use the Freebase search APIs to find *entities* *IDs*,

use the search and languages parameter to search for the IDs

>  3. use a referenced site (the DBpedia Site) to get the fields for those
> entities.
> Therefore we do not use Freebase to retrieve properties (because in
> Freebase they're not as complete as in DBpedia, or anyway are mostly in
> English), but to search the most likely entity IDs. Then we use these IDs
> against the DBpedia Referenced Site (using the *sameAs*).
>

Exactly. Just get the Entityhub ReferencedSite for DBpedia and perform
those queries.

Please also not the documentation at [5] and also the javadoc of the
EntitySearcher interface


best
Rupert

[5] http://stanbol.apache.org/docs/trunk/components/enhancer/engines/entitylinking#entitysearcher



> Does it make it a possible candidate for as an EntitySearcher?
>
> BR
> David
>
>
> On Wed, Dec 12, 2012 at 2:09 PM, David Riccitelli <da...@insideout.io>wrote:
>
>> Thanks Rupert,
>>
>> I'll follow up on the EntitySearcher. Let me know if you need anything
>> else from my side.
>>
>> BR
>> David
>>
>> On Wed, Dec 12, 2012 at 1:46 PM, Rupert Westenthaler <
>> rupert.westenthaler@gmail.com> wrote:
>>
>>> EntitySearcher
>>
>>
>>
>>
>> --
>> David Riccitelli
>>
>>
>> ********************************************************************************
>> InsideOut10 s.r.l.
>> P.IVA: IT-11381771002
>> Fax: +39 0110708239
>> ---
>> LinkedIn: http://it.linkedin.com/in/riccitelli
>> Twitter: ziodave
>> ---
>> Layar Partner Network<http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1>
>>
>> ********************************************************************************
>>
>>
>
>
> --
> David Riccitelli
>
> ********************************************************************************
> InsideOut10 s.r.l.
> P.IVA: IT-11381771002
> Fax: +39 0110708239
> ---
> LinkedIn: http://it.linkedin.com/in/riccitelli
> Twitter: ziodave
> ---
> Layar Partner Network<http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1>
> ********************************************************************************



--
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: InsideOut10 contributed engines

Posted by David Riccitelli <da...@insideout.io>.
As far as I can see the EntitySearcher provides representations of Entities
given their IDs. Correct?

What the Freebase Entity Search engine does is:
 1. take *nouns* (from the PoS Tagging, or similar)
 2. use the Freebase search APIs to find *entities* *IDs*,
 3. use a referenced site (the DBpedia Site) to get the fields for those
entities.

Therefore we do not use Freebase to retrieve properties (because in
Freebase they're not as complete as in DBpedia, or anyway are mostly in
English), but to search the most likely entity IDs. Then we use these IDs
against the DBpedia Referenced Site (using the *sameAs*).

Does it make it a possible candidate for as an EntitySearcher?

BR
David


On Wed, Dec 12, 2012 at 2:09 PM, David Riccitelli <da...@insideout.io>wrote:

> Thanks Rupert,
>
> I'll follow up on the EntitySearcher. Let me know if you need anything
> else from my side.
>
> BR
> David
>
> On Wed, Dec 12, 2012 at 1:46 PM, Rupert Westenthaler <
> rupert.westenthaler@gmail.com> wrote:
>
>> EntitySearcher
>
>
>
>
> --
> David Riccitelli
>
>
> ********************************************************************************
> InsideOut10 s.r.l.
> P.IVA: IT-11381771002
> Fax: +39 0110708239
> ---
> LinkedIn: http://it.linkedin.com/in/riccitelli
> Twitter: ziodave
> ---
> Layar Partner Network<http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1>
>
> ********************************************************************************
>
>


-- 
David Riccitelli

********************************************************************************
InsideOut10 s.r.l.
P.IVA: IT-11381771002
Fax: +39 0110708239
---
LinkedIn: http://it.linkedin.com/in/riccitelli
Twitter: ziodave
---
Layar Partner Network<http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1>
********************************************************************************

Re: InsideOut10 contributed engines

Posted by David Riccitelli <da...@insideout.io>.
Thanks Rupert,

I'll follow up on the EntitySearcher. Let me know if you need anything else
from my side.

BR
David

On Wed, Dec 12, 2012 at 1:46 PM, Rupert Westenthaler <
rupert.westenthaler@gmail.com> wrote:

> EntitySearcher




-- 
David Riccitelli

********************************************************************************
InsideOut10 s.r.l.
P.IVA: IT-11381771002
Fax: +39 0110708239
---
LinkedIn: http://it.linkedin.com/in/riccitelli
Twitter: ziodave
---
Layar Partner Network<http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1>
********************************************************************************

Re: InsideOut10 contributed engines

Posted by Fabian Christ <ch...@googlemail.com>.
Hi Rupert,

2012/12/12 Rupert Westenthaler <ru...@gmail.com>

> Freeling is licensed under GPLv3 a license that is NOT compatible with
> Apache. This is also true for the java API (an JNI wrapper over the C
> stuff) and all modules that depend on this API - basically the
> freeling-** modules.
>
> [...]
>
> As I do not have any experience on how to deal with situations like
> comments would be very welcome.
>

Puh - unfortunately I think this is a no go. We can not have something in
the Stanbol source tree that is linked against GPL code. Our source code
has to be checked for such things. IMO there is also no reason to have it
in Stanbol if it can not be released.

This is a great contribution but I do not think we can accept it this way.
Or is Freeling planing to change the license in the near future?

Best,
 - Fabian
-- 
Fabian
http://twitter.com/fctwitt