You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by "Alessio Bosca (Created) (JIRA)" <ji...@apache.org> on 2012/04/11 14:33:24 UTC

[jira] [Created] (STANBOL-583) CELI enhancement engine(s) - Contribution to stanbol

CELI enhancement engine(s)  - Contribution to stanbol
-----------------------------------------------------

                 Key: STANBOL-583
                 URL: https://issues.apache.org/jira/browse/STANBOL-583
             Project: Stanbol
          Issue Type: New Feature
          Components: Enhancer
    Affects Versions: 0.9.0-incubating
         Environment: Enhancement Engines developed as web service clients
            Reporter: Alessio Bosca
            Priority: Minor
             Fix For: 0.9.0-incubating


The services included so far in the module as Enhancement Engines are:
- a Named Entity Recognition service for French
- a Lemmatizer for Italian, German, Romanian, Russian, Danish  (it creates an annotation on the document whose content is the lemmatized form of the document)
- a Language Identifier for Italian, French,German,Spanish, Portuguese, Polish, Hungarian, Dutch, Swedish,Arabic, Russian,Turkish, Romanian, Greek, Norwegian
- a Document Classification services for Italian, French, German, English, Spanish, Portuguese that associates a document to DBPedia classes 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (STANBOL-583) CELI enhancement engine(s) - Contribution to stanbol

Posted by "Rupert Westenthaler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STANBOL-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13288358#comment-13288358 ] 

Rupert Westenthaler commented on STANBOL-583:
---------------------------------------------

Hi Alessio,

While testing I found an other server side issue.

When configuring an illegal formatted license key - that is not in the form '{user-name}:{password}' the CELI server answers with  "200 OK" but sends as contents an plain text error message. This results than in a rather unrelated

    Caused by: org.xml.sax.SAXParseException: Content is not allowed in prolog.

Exception, because a valid XML is expected.

In my opinion the server should return a HTTP status 4** (Bad Request) in those cases.

I have written an according UnitTest [1], but for now deactivated the according test to do not break the build.

NOTE: This could be also solved by validating the license key parameter within the activate methods of the CELI engines. If you prefer this option I would add an according utility method to the "org.apache.stanbol.enhancer.engines.celi.utils.Utils" class.


[1] http://svn.apache.org/repos/asf/incubator/stanbol/trunk/enhancer/engines/celi/src/test/java/org/apache/stanbol/enhancer/engines/celi/CeliHttpTest.java
                
> CELI enhancement engine(s)  - Contribution to stanbol
> -----------------------------------------------------
>
>                 Key: STANBOL-583
>                 URL: https://issues.apache.org/jira/browse/STANBOL-583
>             Project: Stanbol
>          Issue Type: New Feature
>          Components: Enhancer
>    Affects Versions: 0.9.0-incubating
>         Environment: Enhancement Engines developed as web service clients
>            Reporter: Alessio Bosca
>            Assignee: Rupert Westenthaler
>            Priority: Minor
>              Labels: patch
>         Attachments: STANBOL-583-celi-engines_20120423_rwesten.patch, STANBOL-583-celi-engines_20120511_abosca.patch, celi.zip, celiPatchNER.patch
>
>
> The services included so far in the module as Enhancement Engines are:
> - a Named Entity Recognition service for French
> - a Lemmatizer for Italian, German, Romanian, Russian, Danish  (it creates an annotation on the document whose content is the lemmatized form of the document)
> - a Language Identifier for Italian, French,German,Spanish, Portuguese, Polish, Hungarian, Dutch, Swedish,Arabic, Russian,Turkish, Romanian, Greek, Norwegian
> - a Document Classification services for Italian, French, German, English, Spanish, Portuguese that associates a document to DBPedia classes 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (STANBOL-583) CELI enhancement engine(s) - Contribution to stanbol

Posted by "Rupert Westenthaler (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STANBOL-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13255394#comment-13255394 ] 

Rupert Westenthaler commented on STANBOL-583:
---------------------------------------------

I was not able to work on that before leaving for WWW2012 so I will not be able to work on that until next weekend. So if someone can work on that this week it would be great.
                
> CELI enhancement engine(s)  - Contribution to stanbol
> -----------------------------------------------------
>
>                 Key: STANBOL-583
>                 URL: https://issues.apache.org/jira/browse/STANBOL-583
>             Project: Stanbol
>          Issue Type: New Feature
>          Components: Enhancer
>    Affects Versions: 0.9.0-incubating
>         Environment: Enhancement Engines developed as web service clients
>            Reporter: Alessio Bosca
>            Priority: Minor
>              Labels: patch
>             Fix For: 0.9.0-incubating
>
>         Attachments: celi.zip
>
>
> The services included so far in the module as Enhancement Engines are:
> - a Named Entity Recognition service for French
> - a Lemmatizer for Italian, German, Romanian, Russian, Danish  (it creates an annotation on the document whose content is the lemmatized form of the document)
> - a Language Identifier for Italian, French,German,Spanish, Portuguese, Polish, Hungarian, Dutch, Swedish,Arabic, Russian,Turkish, Romanian, Greek, Norwegian
> - a Document Classification services for Italian, French, German, English, Spanish, Portuguese that associates a document to DBPedia classes 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (STANBOL-583) CELI enhancement engine(s) - Contribution to stanbol

Posted by "Alessio Bosca (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STANBOL-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289390#comment-13289390 ] 

Alessio Bosca commented on STANBOL-583:
---------------------------------------

Hi Rupert,

thanks for the feedback, we fixed the problem on server side. Now a 4** HTTP status is returned when the license key is not well formatted (and therefore not correct)

Alessio
                
> CELI enhancement engine(s)  - Contribution to stanbol
> -----------------------------------------------------
>
>                 Key: STANBOL-583
>                 URL: https://issues.apache.org/jira/browse/STANBOL-583
>             Project: Stanbol
>          Issue Type: New Feature
>          Components: Enhancer
>    Affects Versions: 0.9.0-incubating
>         Environment: Enhancement Engines developed as web service clients
>            Reporter: Alessio Bosca
>            Assignee: Rupert Westenthaler
>            Priority: Minor
>              Labels: patch
>         Attachments: STANBOL-583-celi-engines_20120423_rwesten.patch, STANBOL-583-celi-engines_20120511_abosca.patch, celi.zip, celiPatchNER.patch
>
>
> The services included so far in the module as Enhancement Engines are:
> - a Named Entity Recognition service for French
> - a Lemmatizer for Italian, German, Romanian, Russian, Danish  (it creates an annotation on the document whose content is the lemmatized form of the document)
> - a Language Identifier for Italian, French,German,Spanish, Portuguese, Polish, Hungarian, Dutch, Swedish,Arabic, Russian,Turkish, Romanian, Greek, Norwegian
> - a Document Classification services for Italian, French, German, English, Spanish, Portuguese that associates a document to DBPedia classes 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (STANBOL-583) CELI enhancement engine(s) - Contribution to stanbol

Posted by "Alessio Bosca (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STANBOL-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251505#comment-13251505 ] 

Alessio Bosca commented on STANBOL-583:
---------------------------------------

A demo installation with the submitted enhancement engines is available at http://research.celi.it:8082/
For any problem or feedback: alessio.baosca@celi.it  
                
> CELI enhancement engine(s)  - Contribution to stanbol
> -----------------------------------------------------
>
>                 Key: STANBOL-583
>                 URL: https://issues.apache.org/jira/browse/STANBOL-583
>             Project: Stanbol
>          Issue Type: New Feature
>          Components: Enhancer
>    Affects Versions: 0.9.0-incubating
>         Environment: Enhancement Engines developed as web service clients
>            Reporter: Alessio Bosca
>            Priority: Minor
>              Labels: patch
>             Fix For: 0.9.0-incubating
>
>         Attachments: celi.zip
>
>
> The services included so far in the module as Enhancement Engines are:
> - a Named Entity Recognition service for French
> - a Lemmatizer for Italian, German, Romanian, Russian, Danish  (it creates an annotation on the document whose content is the lemmatized form of the document)
> - a Language Identifier for Italian, French,German,Spanish, Portuguese, Polish, Hungarian, Dutch, Swedish,Arabic, Russian,Turkish, Romanian, Greek, Norwegian
> - a Document Classification services for Italian, French, German, English, Spanish, Portuguese that associates a document to DBPedia classes 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (STANBOL-583) CELI enhancement engine(s) - Contribution to stanbol

Posted by "Alessio Bosca (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/STANBOL-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alessio Bosca updated STANBOL-583:
----------------------------------

    Attachment: celi.zip

code of the enhancement engine (it expects to be placed in the project tree under enhancement/engines/  in order to correctly resolve the parent pom location)
                
> CELI enhancement engine(s)  - Contribution to stanbol
> -----------------------------------------------------
>
>                 Key: STANBOL-583
>                 URL: https://issues.apache.org/jira/browse/STANBOL-583
>             Project: Stanbol
>          Issue Type: New Feature
>          Components: Enhancer
>    Affects Versions: 0.9.0-incubating
>         Environment: Enhancement Engines developed as web service clients
>            Reporter: Alessio Bosca
>            Priority: Minor
>              Labels: patch
>             Fix For: 0.9.0-incubating
>
>         Attachments: celi.zip
>
>
> The services included so far in the module as Enhancement Engines are:
> - a Named Entity Recognition service for French
> - a Lemmatizer for Italian, German, Romanian, Russian, Danish  (it creates an annotation on the document whose content is the lemmatized form of the document)
> - a Language Identifier for Italian, French,German,Spanish, Portuguese, Polish, Hungarian, Dutch, Swedish,Arabic, Russian,Turkish, Romanian, Greek, Norwegian
> - a Document Classification services for Italian, French, German, English, Spanish, Portuguese that associates a document to DBPedia classes 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (STANBOL-583) CELI enhancement engine(s) - Contribution to stanbol

Posted by "Alessio Bosca (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STANBOL-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273159#comment-13273159 ] 

Alessio Bosca commented on STANBOL-583:
---------------------------------------

Hi Rupert,

sorry for the delay in updating the patch.. 
1) checked out the project from svn (11 May) and validated your changes 
2) added the support for Online mode and the relative dependency. I tested it locally and the engines are not loaded when stanbol is started with -Dorg.apache.stanbol.offline.mode=true
3) explicitly specified the charset encoding as UTF-8; it should fix the issues you encountered. Could you please check if it works on your system? (I don't have a MAC for testing it)  
4) removed the reference to prefixes in XML response parsing
5) single instance for **ClientHTTP in the enhancement engines

Let me know if the patch is fine. 

Bests, Alessio

                
> CELI enhancement engine(s)  - Contribution to stanbol
> -----------------------------------------------------
>
>                 Key: STANBOL-583
>                 URL: https://issues.apache.org/jira/browse/STANBOL-583
>             Project: Stanbol
>          Issue Type: New Feature
>          Components: Enhancer
>    Affects Versions: 0.9.0-incubating
>         Environment: Enhancement Engines developed as web service clients
>            Reporter: Alessio Bosca
>            Assignee: Rupert Westenthaler
>            Priority: Minor
>              Labels: patch
>             Fix For: 0.9.0-incubating
>
>         Attachments: STANBOL-583-celi-engines_20120423_rwesten.patch, STANBOL-583-celi-engines_20120511_abosca.patch, celi.zip
>
>
> The services included so far in the module as Enhancement Engines are:
> - a Named Entity Recognition service for French
> - a Lemmatizer for Italian, German, Romanian, Russian, Danish  (it creates an annotation on the document whose content is the lemmatized form of the document)
> - a Language Identifier for Italian, French,German,Spanish, Portuguese, Polish, Hungarian, Dutch, Swedish,Arabic, Russian,Turkish, Romanian, Greek, Norwegian
> - a Document Classification services for Italian, French, German, English, Spanish, Portuguese that associates a document to DBPedia classes 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (STANBOL-583) CELI enhancement engine(s) - Contribution to stanbol

Posted by "Rupert Westenthaler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/STANBOL-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rupert Westenthaler updated STANBOL-583:
----------------------------------------

    Attachment: STANBOL-583-celi-engines_20120423_rwesten.patch

NOTE: The originally attached zip archive was not a patch, but an archive of the source tree. Because this adds a new EnhancementEngine I was still able to correctly apply it by extracting the archive, copying to the /enhancer/engine and removing all svn metadata.


Created a new Patch that includes the following changes

* Applied some minor changes necessary to compile with recent changes within the trunk.
* Dependencies
    * changed dependencies of the Apache commons httpclient to the OSGI bundle version "httpclient-osgi"
    * removed the unused dependency to OpenNLP
    * now there are no embedded dependencies
* Logging
    * changed Logger API from Apache log4j to SLF4J - the logging Framework used by Apache Stanbol. 
    * Loggings in the test still use log4j via SLF4J

TODOs/Questions:

1. Stanbol EnhancementEngine MUST support "offline mode": This ensures that no connections to external services are made if Stanbol is started in offline mode (-Dorg.apache.stanbol.offline.mode=true). EnhancementEngines that do require an external service need than to deactivate themself. This is easiest achieved by adding

    @Reference
    private OnlineMode onlineMode;

as the OnlineMode service will only be available if OfflineMode is deactivated. 

You will also need to add

    <dependency>
        <groupId>org.apache.stanbol</groupId>
        <artifactId>org.apache.stanbol.commons.stanboltools.offline</artifactId>
        <scope>provided</scope>
    </dependency>

2. While all unit tests succeed I noticed exceptions like

    com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.
        at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:684)
        ...

indicating that the char encoding used by the received data is not UTF8. In fact the responses of the service do not specify any encoding

    <?xml version="1.0" ?><S:Envelope xmlns:S="http://schemas.xmlsoap.org/soap/envelope/"><S:Body><ns2:guessLanguageResponse ...

However I think this is related to how the request data is processed by **ClientHTTP.java classes.

* the "doPost(..)" method returns a String and uses "UTF-8" for parsing the String from the received bytes. So far so good
* the calling method than creates an other ByteArrayInputStream for the returned String by using String.getBytes(). This will create the byte[] representation of the String by using the Plattform encoding ("MAC Roman" in my case).
* This stream is than set to SOAPPart#setContent(...). Now I assume that because the XML string does not include a explicit charset this implementation will use UTF-8 to parse the "MAC Roman" encoded byte sequence.

I would suggest to change the doPost(..) method to return the InputStream and set this stream directly to SOAPPart#setContent(...).

3. I noticed that for each request a **ClientHTTP instance is created. I would rather expect a single instance to be created during the engine activation or do I miss a good reason why it is better to create a new instance for each enhancement request?

4. The ClassificationClientHTTP uses "ns2:label" and "ns2:score" to access the data. This seams dangerous  as the used prefixes may depend on the used XML framework and those might change over time.  I would suggest to explicitly refer to the namespace "http://linguagrid.org/v20110204/commons" instead.

Alessio Bosca can you please 

1. validate that the my changes do work with the current trunk
2. my changes in the dependencies do not break the engines
3. add support for Offline Mode
4. have a look at the char encoding issues I encountered

On my TODO list is

1. validation of the created RDF (TextAnnotations, EntityAnnotations, TopicAnnotations)
2. read/write locks on the ContentItem and the metadata (as you return "ENHANCE_ASYNC" in the canEnhance(..) method this is necessary)
3. testing the Engines on a Stanbol instance within a real EnhancementChain.
                
> CELI enhancement engine(s)  - Contribution to stanbol
> -----------------------------------------------------
>
>                 Key: STANBOL-583
>                 URL: https://issues.apache.org/jira/browse/STANBOL-583
>             Project: Stanbol
>          Issue Type: New Feature
>          Components: Enhancer
>    Affects Versions: 0.9.0-incubating
>         Environment: Enhancement Engines developed as web service clients
>            Reporter: Alessio Bosca
>            Priority: Minor
>              Labels: patch
>             Fix For: 0.9.0-incubating
>
>         Attachments: STANBOL-583-celi-engines_20120423_rwesten.patch, celi.zip
>
>
> The services included so far in the module as Enhancement Engines are:
> - a Named Entity Recognition service for French
> - a Lemmatizer for Italian, German, Romanian, Russian, Danish  (it creates an annotation on the document whose content is the lemmatized form of the document)
> - a Language Identifier for Italian, French,German,Spanish, Portuguese, Polish, Hungarian, Dutch, Swedish,Arabic, Russian,Turkish, Romanian, Greek, Norwegian
> - a Document Classification services for Italian, French, German, English, Spanish, Portuguese that associates a document to DBPedia classes 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (STANBOL-583) CELI enhancement engine(s) - Contribution to stanbol

Posted by "Alessio Bosca (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/STANBOL-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alessio Bosca updated STANBOL-583:
----------------------------------

    Attachment: STANBOL-583-celi-engines_20120511_abosca.patch

Patch created against the svn version of 11 may 2012 
                
> CELI enhancement engine(s)  - Contribution to stanbol
> -----------------------------------------------------
>
>                 Key: STANBOL-583
>                 URL: https://issues.apache.org/jira/browse/STANBOL-583
>             Project: Stanbol
>          Issue Type: New Feature
>          Components: Enhancer
>    Affects Versions: 0.9.0-incubating
>         Environment: Enhancement Engines developed as web service clients
>            Reporter: Alessio Bosca
>            Assignee: Rupert Westenthaler
>            Priority: Minor
>              Labels: patch
>             Fix For: 0.9.0-incubating
>
>         Attachments: STANBOL-583-celi-engines_20120423_rwesten.patch, STANBOL-583-celi-engines_20120511_abosca.patch, celi.zip
>
>
> The services included so far in the module as Enhancement Engines are:
> - a Named Entity Recognition service for French
> - a Lemmatizer for Italian, German, Romanian, Russian, Danish  (it creates an annotation on the document whose content is the lemmatized form of the document)
> - a Language Identifier for Italian, French,German,Spanish, Portuguese, Polish, Hungarian, Dutch, Swedish,Arabic, Russian,Turkish, Romanian, Greek, Norwegian
> - a Document Classification services for Italian, French, German, English, Spanish, Portuguese that associates a document to DBPedia classes 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (STANBOL-583) CELI enhancement engine(s) - Contribution to stanbol

Posted by "Rupert Westenthaler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/STANBOL-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rupert Westenthaler reassigned STANBOL-583:
-------------------------------------------

    Assignee: Rupert Westenthaler
    
> CELI enhancement engine(s)  - Contribution to stanbol
> -----------------------------------------------------
>
>                 Key: STANBOL-583
>                 URL: https://issues.apache.org/jira/browse/STANBOL-583
>             Project: Stanbol
>          Issue Type: New Feature
>          Components: Enhancer
>    Affects Versions: 0.9.0-incubating
>         Environment: Enhancement Engines developed as web service clients
>            Reporter: Alessio Bosca
>            Assignee: Rupert Westenthaler
>            Priority: Minor
>              Labels: patch
>             Fix For: 0.9.0-incubating
>
>         Attachments: STANBOL-583-celi-engines_20120423_rwesten.patch, celi.zip
>
>
> The services included so far in the module as Enhancement Engines are:
> - a Named Entity Recognition service for French
> - a Lemmatizer for Italian, German, Romanian, Russian, Danish  (it creates an annotation on the document whose content is the lemmatized form of the document)
> - a Language Identifier for Italian, French,German,Spanish, Portuguese, Polish, Hungarian, Dutch, Swedish,Arabic, Russian,Turkish, Romanian, Greek, Norwegian
> - a Document Classification services for Italian, French, German, English, Spanish, Portuguese that associates a document to DBPedia classes 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (STANBOL-583) CELI enhancement engine(s) - Contribution to stanbol

Posted by "Alessio Bosca (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STANBOL-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287270#comment-13287270 ] 

Alessio Bosca commented on STANBOL-583:
---------------------------------------

Fixed problem caused by invalid XML character on service side.
                
> CELI enhancement engine(s)  - Contribution to stanbol
> -----------------------------------------------------
>
>                 Key: STANBOL-583
>                 URL: https://issues.apache.org/jira/browse/STANBOL-583
>             Project: Stanbol
>          Issue Type: New Feature
>          Components: Enhancer
>    Affects Versions: 0.9.0-incubating
>         Environment: Enhancement Engines developed as web service clients
>            Reporter: Alessio Bosca
>            Assignee: Rupert Westenthaler
>            Priority: Minor
>              Labels: patch
>         Attachments: STANBOL-583-celi-engines_20120423_rwesten.patch, STANBOL-583-celi-engines_20120511_abosca.patch, celi.zip, celiPatchNER.patch
>
>
> The services included so far in the module as Enhancement Engines are:
> - a Named Entity Recognition service for French
> - a Lemmatizer for Italian, German, Romanian, Russian, Danish  (it creates an annotation on the document whose content is the lemmatized form of the document)
> - a Language Identifier for Italian, French,German,Spanish, Portuguese, Polish, Hungarian, Dutch, Swedish,Arabic, Russian,Turkish, Romanian, Greek, Norwegian
> - a Document Classification services for Italian, French, German, English, Spanish, Portuguese that associates a document to DBPedia classes 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (STANBOL-583) CELI enhancement engine(s) - Contribution to stanbol

Posted by "Alessio Bosca (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STANBOL-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285682#comment-13285682 ] 

Alessio Bosca commented on STANBOL-583:
---------------------------------------

Dear Rupert,

we have fixed the service-side problems with start/end position in the text and the issue in the returned formKind. We also have added the support for italian NER
I made a few changes in the client (extra param for the language) and I've just submitted a patch about that

Alessio
                
> CELI enhancement engine(s)  - Contribution to stanbol
> -----------------------------------------------------
>
>                 Key: STANBOL-583
>                 URL: https://issues.apache.org/jira/browse/STANBOL-583
>             Project: Stanbol
>          Issue Type: New Feature
>          Components: Enhancer
>    Affects Versions: 0.9.0-incubating
>         Environment: Enhancement Engines developed as web service clients
>            Reporter: Alessio Bosca
>            Assignee: Rupert Westenthaler
>            Priority: Minor
>              Labels: patch
>         Attachments: STANBOL-583-celi-engines_20120423_rwesten.patch, STANBOL-583-celi-engines_20120511_abosca.patch, celi.zip, celiPatchNER.patch
>
>
> The services included so far in the module as Enhancement Engines are:
> - a Named Entity Recognition service for French
> - a Lemmatizer for Italian, German, Romanian, Russian, Danish  (it creates an annotation on the document whose content is the lemmatized form of the document)
> - a Language Identifier for Italian, French,German,Spanish, Portuguese, Polish, Hungarian, Dutch, Swedish,Arabic, Russian,Turkish, Romanian, Greek, Norwegian
> - a Document Classification services for Italian, French, German, English, Spanish, Portuguese that associates a document to DBPedia classes 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (STANBOL-583) CELI enhancement engine(s) - Contribution to stanbol

Posted by "Rupert Westenthaler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STANBOL-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286520#comment-13286520 ] 

Rupert Westenthaler commented on STANBOL-583:
---------------------------------------------

Hi Alessio,

while testing I found an other bug in your Server implementation. In revision 1344669 I added an other unit test to the NER engine that nicely reproduces it.

The root cause is

    Caused by: org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x19) was found in the element content of the document.

while creating the 

    SOAPBody soapBody = message.getSOAPBody();

for the response data of the NER response (NERserviceClientHTTP). Based on a short google search, I assume that the server does not correctly escape special chars in the labels of detected entities. Most posts suggest that using "StringEscapeUtils.escapeXml(..)" solves this.

NOTE: This does not block this issue, as it does not affect the contributed Engine.
                
> CELI enhancement engine(s)  - Contribution to stanbol
> -----------------------------------------------------
>
>                 Key: STANBOL-583
>                 URL: https://issues.apache.org/jira/browse/STANBOL-583
>             Project: Stanbol
>          Issue Type: New Feature
>          Components: Enhancer
>    Affects Versions: 0.9.0-incubating
>         Environment: Enhancement Engines developed as web service clients
>            Reporter: Alessio Bosca
>            Assignee: Rupert Westenthaler
>            Priority: Minor
>              Labels: patch
>         Attachments: STANBOL-583-celi-engines_20120423_rwesten.patch, STANBOL-583-celi-engines_20120511_abosca.patch, celi.zip, celiPatchNER.patch
>
>
> The services included so far in the module as Enhancement Engines are:
> - a Named Entity Recognition service for French
> - a Lemmatizer for Italian, German, Romanian, Russian, Danish  (it creates an annotation on the document whose content is the lemmatized form of the document)
> - a Language Identifier for Italian, French,German,Spanish, Portuguese, Polish, Hungarian, Dutch, Swedish,Arabic, Russian,Turkish, Romanian, Greek, Norwegian
> - a Document Classification services for Italian, French, German, English, Spanish, Portuguese that associates a document to DBPedia classes 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (STANBOL-583) CELI enhancement engine(s) - Contribution to stanbol

Posted by "Rupert Westenthaler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/STANBOL-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rupert Westenthaler resolved STANBOL-583.
-----------------------------------------

       Resolution: Fixed
    Fix Version/s: enhancer-0.10.0-incubating

The CELI engines are now ready to use

* they are included in the Enhancer Bundle List
* users will need to add the License key OR to enable usage of the test account. After doing this one needs to manually stop/start the engine(s) in the components tab of the Felix Web Console ({host}/system/console/components)

NOTE: the license key can be set for all engines via a System property (-Dceli.license={user}:{pwd}) or by adding it to the sling.properties file in the stanbol root directory.

A big thanks to alessio and the CELI team for their contributions!
                
> CELI enhancement engine(s)  - Contribution to stanbol
> -----------------------------------------------------
>
>                 Key: STANBOL-583
>                 URL: https://issues.apache.org/jira/browse/STANBOL-583
>             Project: Stanbol
>          Issue Type: New Feature
>          Components: Enhancer
>    Affects Versions: 0.9.0-incubating
>         Environment: Enhancement Engines developed as web service clients
>            Reporter: Alessio Bosca
>            Assignee: Rupert Westenthaler
>            Priority: Minor
>              Labels: patch
>             Fix For: enhancer-0.10.0-incubating
>
>         Attachments: STANBOL-583-celi-engines_20120423_rwesten.patch, STANBOL-583-celi-engines_20120511_abosca.patch, celi.zip, celiPatchNER.patch
>
>
> The services included so far in the module as Enhancement Engines are:
> - a Named Entity Recognition service for French
> - a Lemmatizer for Italian, German, Romanian, Russian, Danish  (it creates an annotation on the document whose content is the lemmatized form of the document)
> - a Language Identifier for Italian, French,German,Spanish, Portuguese, Polish, Hungarian, Dutch, Swedish,Arabic, Russian,Turkish, Romanian, Greek, Norwegian
> - a Document Classification services for Italian, French, German, English, Spanish, Portuguese that associates a document to DBPedia classes 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (STANBOL-583) CELI enhancement engine(s) - Contribution to stanbol

Posted by "Rupert Westenthaler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STANBOL-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286314#comment-13286314 ] 

Rupert Westenthaler commented on STANBOL-583:
---------------------------------------------

Hi Alessio

After applying the patch the unit test complete successful. I plan to go over all the engines again and do some testing of the Engines within a running Stanbol Instance later today.
                
> CELI enhancement engine(s)  - Contribution to stanbol
> -----------------------------------------------------
>
>                 Key: STANBOL-583
>                 URL: https://issues.apache.org/jira/browse/STANBOL-583
>             Project: Stanbol
>          Issue Type: New Feature
>          Components: Enhancer
>    Affects Versions: 0.9.0-incubating
>         Environment: Enhancement Engines developed as web service clients
>            Reporter: Alessio Bosca
>            Assignee: Rupert Westenthaler
>            Priority: Minor
>              Labels: patch
>         Attachments: STANBOL-583-celi-engines_20120423_rwesten.patch, STANBOL-583-celi-engines_20120511_abosca.patch, celi.zip, celiPatchNER.patch
>
>
> The services included so far in the module as Enhancement Engines are:
> - a Named Entity Recognition service for French
> - a Lemmatizer for Italian, German, Romanian, Russian, Danish  (it creates an annotation on the document whose content is the lemmatized form of the document)
> - a Language Identifier for Italian, French,German,Spanish, Portuguese, Polish, Hungarian, Dutch, Swedish,Arabic, Russian,Turkish, Romanian, Greek, Norwegian
> - a Document Classification services for Italian, French, German, English, Spanish, Portuguese that associates a document to DBPedia classes 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (STANBOL-583) CELI enhancement engine(s) - Contribution to stanbol

Posted by "Alessio Bosca (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STANBOL-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273249#comment-13273249 ] 

Alessio Bosca commented on STANBOL-583:
---------------------------------------

Since I forgot to select the option when I uploaded my last patch I want to explicitly grant the Apache License 2.0 for "STANBOL-583-celi-engines_20120511_abosca.patch" as attached to STANBOL-583.

Alessio
                
> CELI enhancement engine(s)  - Contribution to stanbol
> -----------------------------------------------------
>
>                 Key: STANBOL-583
>                 URL: https://issues.apache.org/jira/browse/STANBOL-583
>             Project: Stanbol
>          Issue Type: New Feature
>          Components: Enhancer
>    Affects Versions: 0.9.0-incubating
>         Environment: Enhancement Engines developed as web service clients
>            Reporter: Alessio Bosca
>            Assignee: Rupert Westenthaler
>            Priority: Minor
>              Labels: patch
>             Fix For: 0.9.0-incubating
>
>         Attachments: STANBOL-583-celi-engines_20120423_rwesten.patch, STANBOL-583-celi-engines_20120511_abosca.patch, celi.zip
>
>
> The services included so far in the module as Enhancement Engines are:
> - a Named Entity Recognition service for French
> - a Lemmatizer for Italian, German, Romanian, Russian, Danish  (it creates an annotation on the document whose content is the lemmatized form of the document)
> - a Language Identifier for Italian, French,German,Spanish, Portuguese, Polish, Hungarian, Dutch, Swedish,Arabic, Russian,Turkish, Romanian, Greek, Norwegian
> - a Document Classification services for Italian, French, German, English, Spanish, Portuguese that associates a document to DBPedia classes 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (STANBOL-583) CELI enhancement engine(s) - Contribution to stanbol

Posted by "Alessio Bosca (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/STANBOL-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13259472#comment-13259472 ] 

Alessio Bosca commented on STANBOL-583:
---------------------------------------

Hi Rupert,

thanks for the work on integrating the Engine and for the feebacks. I'll 
work on the suggested todo list and send you an update as soon as it is 
ready (I should be able to send you a new version by Thursday)

Bests
     Alessio



-- 
*************************************
Alessio Bosca, Ph.D.
CELI s.r.l.
Via San Quintino 31
10121 Torino
Tel. +39 011.562.71.15
Fax +39 011.506.40.86
http://www.celi.it
*************************************



                
> CELI enhancement engine(s)  - Contribution to stanbol
> -----------------------------------------------------
>
>                 Key: STANBOL-583
>                 URL: https://issues.apache.org/jira/browse/STANBOL-583
>             Project: Stanbol
>          Issue Type: New Feature
>          Components: Enhancer
>    Affects Versions: 0.9.0-incubating
>         Environment: Enhancement Engines developed as web service clients
>            Reporter: Alessio Bosca
>            Assignee: Rupert Westenthaler
>            Priority: Minor
>              Labels: patch
>             Fix For: 0.9.0-incubating
>
>         Attachments: STANBOL-583-celi-engines_20120423_rwesten.patch, celi.zip
>
>
> The services included so far in the module as Enhancement Engines are:
> - a Named Entity Recognition service for French
> - a Lemmatizer for Italian, German, Romanian, Russian, Danish  (it creates an annotation on the document whose content is the lemmatized form of the document)
> - a Language Identifier for Italian, French,German,Spanish, Portuguese, Polish, Hungarian, Dutch, Swedish,Arabic, Russian,Turkish, Romanian, Greek, Norwegian
> - a Document Classification services for Italian, French, German, English, Spanish, Portuguese that associates a document to DBPedia classes 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (STANBOL-583) CELI enhancement engine(s) - Contribution to stanbol

Posted by "Fabian Christ (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/STANBOL-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Fabian Christ updated STANBOL-583:
----------------------------------

    Fix Version/s:     (was: 0.9.0-incubating)
                   0.10.0-incubating
    
> CELI enhancement engine(s)  - Contribution to stanbol
> -----------------------------------------------------
>
>                 Key: STANBOL-583
>                 URL: https://issues.apache.org/jira/browse/STANBOL-583
>             Project: Stanbol
>          Issue Type: New Feature
>          Components: Enhancer
>    Affects Versions: 0.9.0-incubating
>         Environment: Enhancement Engines developed as web service clients
>            Reporter: Alessio Bosca
>            Assignee: Rupert Westenthaler
>            Priority: Minor
>              Labels: patch
>             Fix For: 0.10.0-incubating
>
>         Attachments: STANBOL-583-celi-engines_20120423_rwesten.patch, STANBOL-583-celi-engines_20120511_abosca.patch, celi.zip
>
>
> The services included so far in the module as Enhancement Engines are:
> - a Named Entity Recognition service for French
> - a Lemmatizer for Italian, German, Romanian, Russian, Danish  (it creates an annotation on the document whose content is the lemmatized form of the document)
> - a Language Identifier for Italian, French,German,Spanish, Portuguese, Polish, Hungarian, Dutch, Swedish,Arabic, Russian,Turkish, Romanian, Greek, Norwegian
> - a Document Classification services for Italian, French, German, English, Spanish, Portuguese that associates a document to DBPedia classes 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (STANBOL-583) CELI enhancement engine(s) - Contribution to stanbol

Posted by "Alessio Bosca (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/STANBOL-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alessio Bosca updated STANBOL-583:
----------------------------------

    Attachment: celiPatchNER.patch

Added support for italian 
                
> CELI enhancement engine(s)  - Contribution to stanbol
> -----------------------------------------------------
>
>                 Key: STANBOL-583
>                 URL: https://issues.apache.org/jira/browse/STANBOL-583
>             Project: Stanbol
>          Issue Type: New Feature
>          Components: Enhancer
>    Affects Versions: 0.9.0-incubating
>         Environment: Enhancement Engines developed as web service clients
>            Reporter: Alessio Bosca
>            Assignee: Rupert Westenthaler
>            Priority: Minor
>              Labels: patch
>         Attachments: STANBOL-583-celi-engines_20120423_rwesten.patch, STANBOL-583-celi-engines_20120511_abosca.patch, celi.zip, celiPatchNER.patch
>
>
> The services included so far in the module as Enhancement Engines are:
> - a Named Entity Recognition service for French
> - a Lemmatizer for Italian, German, Romanian, Russian, Danish  (it creates an annotation on the document whose content is the lemmatized form of the document)
> - a Language Identifier for Italian, French,German,Spanish, Portuguese, Polish, Hungarian, Dutch, Swedish,Arabic, Russian,Turkish, Romanian, Greek, Norwegian
> - a Document Classification services for Italian, French, German, English, Spanish, Portuguese that associates a document to DBPedia classes 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira