You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by Alessio Bosca <al...@celi.it> on 2013/01/09 10:22:48 UTC

Parch for CELI services update (added support for morphology analysis for Swedish, added engine for Sentiment Analysis for French and Italian)

Hi Rupert,

yesterday I updated the morphological Analysis service (We added support 
for the sv language, fixed the umlauts issue for German and unified all 
the POS tagsets for the supported languages)
Therefore currently the pos tags  returned by the web service are not 
coherent with the ones declared in the postag mappings in the 
enhancement engine. For this reasons the tests on the morphological 
engine should be failing.
Concerning the umlauts (and /ß character)/, they are now recognized by 
the system but currently the lemma produced by the morphological 
analyzer converts them to sequences of characters (e.g. /ö -> oe, //ß -> 
ss)./
In the future version of the system I would like to include both lemma 
writings/.
/
The language identifier has been updated as well (adding a few new 
languages, now the list of supported languages includes 
en,fr,de,hu,pl,it,es,pt,el,et,lv,tr,pt,ru,ar,ro,da)  without changes for 
the engines

The patch you can find in the attachment contains the fixes for the 
morphological service updates (sv language addition and new pos tag 
mapping, one for all the languages)
The patch also contains the client and the test classes for the 
sentiment analysis  engine supporting fr and it

Let me know if you have any problem integrating the patch

Alessio

On 01/07/2013 05:44 PM, Alessio Bosca wrote:
> Hi Rupert,
>
> sure tomorrow I'll have a look into that and let you know
>
> bests
>     Alessio
>
> On 01/04/2013 01:07 PM, Rupert Westenthaler wrote:
>> Hi Alessio, all
>>
>> Thanks for looking into that. However with the Jenking build #1200
>> there is still one remaining issue
>>
>> tesetEngine(org.apache.stanbol.enhancer.engines.celi.langid.impl.CeliLanguageIdentifierEnhancementEngineTest) 
>>
>>   Time elapsed: 0.296 sec  <<< FAILURE!
>> junit.framework.ComparisonFailure: The detected language for text
>> 'Brigitte Bardot, née  le 28 septembre 1934 à Paris, est une actrice
>> de cinéma et chanteuse française.' MUST BE 'fr' expected:<[f]r> but
>> was:<[a]r>
>>          at junit.framework.Assert.assertEquals(Assert.java:100)
>>          at 
>> org.apache.stanbol.enhancer.engines.celi.langid.impl.CeliLanguageIdentifierEnhancementEngineTest.tesetEngine(CeliLanguageIdentifierEnhancementEngineTest.java:101)
>>
>> Looks like the Language Identification engine detects the language
>> Arabic for the French text example. I am also able to reproduce this
>> issue locally.
>>
>> Can you have a look into that?
>> best
>> Rupert
>>
>>
>> On Fri, Jan 4, 2013 at 10:03 AM, Alessio Bosca 
>> <al...@celi.it> wrote:
>>> Hi Rupert,
>>>
>>> thanks for the feedback, there was a problem with the the access 
>>> control for
>>> anonymous users in the services. Now it has been fixed .
>>>
>>> PS: Next week I'll send you a patch for CELI engines with the sentiment
>>> analysis and bug fixes for the German umlauts.
>>> Sorry for the delay in the release.
>>>
>>> Bests,
>>>      Alessio
>>>
>>> -- 
>>> *************************************
>>> Alessio Bosca, Ph.D.
>>> CELI s.r.l.
>>> Via San Quintino 31
>>> 10121 Torino
>>> Tel. +39 011.562.71.15
>>> Fax +39 011.506.40.86
>>> http://www.celi.it
>>> *************************************
>>>
>>>
>>
>>
>
>


-- 
*************************************
Alessio Bosca, Ph.D.
CELI s.r.l.
Via San Quintino 31
10121 Torino
Tel. +39 011.562.71.15
Fax +39 011.506.40.86
http://www.celi.it
*************************************


Re: Parch for CELI services update (added support for morphology analysis for Swedish, added engine for Sentiment Analysis for French and Italian)

Posted by Alessio Bosca <al...@celi.it>.
Hi All,

I forgot to mention that the svn diff patch is created on the base of  
stanbol-nlp-processing branch

Alessio

On 01/09/2013 10:22 AM, Alessio Bosca wrote:
> Hi Rupert,
>
> yesterday I updated the morphological Analysis service (We added 
> support for the sv language, fixed the umlauts issue for German and 
> unified all the POS tagsets for the supported languages)
> Therefore currently the pos tags  returned by the web service are not 
> coherent with the ones declared in the postag mappings in the 
> enhancement engine. For this reasons the tests on the morphological 
> engine should be failing.
> Concerning the umlauts (and /ß character)/, they are now recognized by 
> the system but currently the lemma produced by the morphological 
> analyzer converts them to sequences of characters (e.g. /ö -> oe, //ß 
> -> ss)./
> In the future version of the system I would like to include both lemma 
> writings/.
> /
> The language identifier has been updated as well (adding a few new 
> languages, now the list of supported languages includes 
> en,fr,de,hu,pl,it,es,pt,el,et,lv,tr,pt,ru,ar,ro,da)  without changes 
> for the engines
>
> The patch you can find in the attachment contains the fixes for the 
> morphological service updates (sv language addition and new pos tag 
> mapping, one for all the languages)
> The patch also contains the client and the test classes for the 
> sentiment analysis  engine supporting fr and it
>
> Let me know if you have any problem integrating the patch
>
> Alessio
>
> On 01/07/2013 05:44 PM, Alessio Bosca wrote:
>> Hi Rupert,
>>
>> sure tomorrow I'll have a look into that and let you know
>>
>> bests
>>     Alessio
>>
>> On 01/04/2013 01:07 PM, Rupert Westenthaler wrote:
>>> Hi Alessio, all
>>>
>>> Thanks for looking into that. However with the Jenking build #1200
>>> there is still one remaining issue
>>>
>>> tesetEngine(org.apache.stanbol.enhancer.engines.celi.langid.impl.CeliLanguageIdentifierEnhancementEngineTest) 
>>>
>>>   Time elapsed: 0.296 sec  <<< FAILURE!
>>> junit.framework.ComparisonFailure: The detected language for text
>>> 'Brigitte Bardot, née  le 28 septembre 1934 à Paris, est une actrice
>>> de cinéma et chanteuse française.' MUST BE 'fr' expected:<[f]r> but
>>> was:<[a]r>
>>>          at junit.framework.Assert.assertEquals(Assert.java:100)
>>>          at 
>>> org.apache.stanbol.enhancer.engines.celi.langid.impl.CeliLanguageIdentifierEnhancementEngineTest.tesetEngine(CeliLanguageIdentifierEnhancementEngineTest.java:101)
>>>
>>> Looks like the Language Identification engine detects the language
>>> Arabic for the French text example. I am also able to reproduce this
>>> issue locally.
>>>
>>> Can you have a look into that?
>>> best
>>> Rupert
>>>
>>>
>>> On Fri, Jan 4, 2013 at 10:03 AM, Alessio Bosca 
>>> <al...@celi.it> wrote:
>>>> Hi Rupert,
>>>>
>>>> thanks for the feedback, there was a problem with the the access 
>>>> control for
>>>> anonymous users in the services. Now it has been fixed .
>>>>
>>>> PS: Next week I'll send you a patch for CELI engines with the 
>>>> sentiment
>>>> analysis and bug fixes for the German umlauts.
>>>> Sorry for the delay in the release.
>>>>
>>>> Bests,
>>>>      Alessio
>>>>
>>>> -- 
>>>> *************************************
>>>> Alessio Bosca, Ph.D.
>>>> CELI s.r.l.
>>>> Via San Quintino 31
>>>> 10121 Torino
>>>> Tel. +39 011.562.71.15
>>>> Fax +39 011.506.40.86
>>>> http://www.celi.it
>>>> *************************************
>>>>
>>>>
>>>
>>>
>>
>>
>
>
> -- 
> *************************************
> Alessio Bosca, Ph.D.
> CELI s.r.l.
> Via San Quintino 31
> 10121 Torino
> Tel. +39 011.562.71.15
> Fax +39 011.506.40.86
> http://www.celi.it
> *************************************


-- 
*************************************
Alessio Bosca, Ph.D.
CELI s.r.l.
Via San Quintino 31
10121 Torino
Tel. +39 011.562.71.15
Fax +39 011.506.40.86
http://www.celi.it
*************************************