You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by "Fabian Christ (JIRA)" <ji...@apache.org> on 2012/05/30 16:30:23 UTC

[jira] [Updated] (STANBOL-613) Define a standard way on how to obtain the extracted language

     [ https://issues.apache.org/jira/browse/STANBOL-613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Fabian Christ updated STANBOL-613:
----------------------------------

    Component/s: Engine - LangID
    
> Define a standard way on how to obtain the extracted language
> -------------------------------------------------------------
>
>                 Key: STANBOL-613
>                 URL: https://issues.apache.org/jira/browse/STANBOL-613
>             Project: Stanbol
>          Issue Type: Sub-task
>          Components: Engine - LangID, Enhancer
>    Affects Versions: 0.9.0-incubating
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>            Priority: Minor
>             Fix For: enhancer-0.10.0-incubating
>
>
> With the addition of the CELI Langauge Identification Engine there are now two different engines that do support the same feature.
> However currently Engines that do consume the detected language are "hard coded" to the LangId Engine (enhancer/engines/langid). Something that need to be changed to allow the adoption of alternatives - like the CELI based implementation.
> The suggestion is to use the following Pattern to extract the language
> (1) via Annotations:
>   ?x rdf:type fise:TextAnnotation .
>   ?x dc:language ?language .
>   OPTIONAL {
>     ?x dc:created ?engine
>   }
>   OPTIONAL {
>     ?x fise:confidence ?confidence
>   }
> (2) via ContentItem metadata
>   ?ci dc:language ?language
> (2) is a fallback if (1) delivers no results.
> Methods that
>  * extract the language (with the highest confidence) - including fallback to (2)
>  * extract all languages (sorted by confidence) - including fallback to (2)
>  * extract all TextAnnotations with dc:language values
> are added to the EnhancementEngineHelper utility of the enhancer.servicesapi module

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira