You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by "Rupert Westenthaler (JIRA)" <ji...@apache.org> on 2012/11/03 09:28:12 UTC

[jira] [Resolved] (STANBOL-792) Extend the NamedEntityExtraction engine to support custom NameFinder Models

     [ https://issues.apache.org/jira/browse/STANBOL-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rupert Westenthaler resolved STANBOL-792.
-----------------------------------------

    Resolution: Fixed

implemented with http://svn.apache.org/viewvc?rev=1405306&view=rev
documentation http://svn.apache.org/viewvc?rev=1405305&view=rev
published with revision 837122
                
> Extend the NamedEntityExtraction engine to support custom NameFinder Models
> ---------------------------------------------------------------------------
>
>                 Key: STANBOL-792
>                 URL: https://issues.apache.org/jira/browse/STANBOL-792
>             Project: Stanbol
>          Issue Type: Sub-task
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>
> This adds an new NER engine that allows to configure custom NER models - OpenNLP TokenNameFinderModel's.
> The configuration uses two properties:
> * __Name Finder Models__ _(stanbol.engines.opennlp-ner.nameFinderModels)_: The list if custom NameFinderModels used by this engine. The Engine supports Arrays, Vectors and comma separated string for. Values are the file names of the NameFinderModel files. Configured files are loaded by using the DataFileProvider service. That means that files copied into the 'datafile' folder (by default located at '{stanbol-working-dir}/stanbol/datafiles').
> * __Named Entity to 'dc:type' Mappings__ _(stanbol.engines.opennlp-ner.typeMappings)_: This configuration uses the syntax {named-entity-type} > {uri}": {named-entity-type} matches to the string "name" used for the named entity type in the OpenNLP NameFinder model. {uri} MUST BE a valid URI and is used as dc:type value for fise:TextAnnotations created by the engine for extracted Named Entities. NOTE: that TextAnnotations for unmapped Named Entity Types will have no dc:type information.
> Example:
> The following configuration uses the '.config' format and needs to provided with a file name similar to  'org.apache.stanbol.enhancer.engines.opennlp.impl.CustomNERModelEnhancementEngine-{component-instance-name}.config' to the Sling FileInstaller (by default {stanbol-working-dir}/stanbol/fileinstall):
>     stanbol.enhancer.engine.name="ehealth-ner"
>     stanbol.engines.opennlp-ner.nameFinderModels=["bionlp2004-DNA-en.bin","bionlp2004-protein-en.bin","bionlp2004-cell_type-en.bin","bionlp2004-cell_line-en.bin","bionlp2004-RNA-en.bin"]
>     stanbol.engines.opennlp-ner.typeMappings=["DNA\ >\ http://www.bootstrep.eu/ontology/GRO#DNA","RNA\ >\ http://www.bootstrep.eu/ontology/GRO#RNA","protein\ >\ http://www.bootstrep.eu/ontology/GRO#Protein","cell_type\ >\ http://purl.bioontology.org/ontology/CL","cell_line\ >\ http://purl.bioontology.org/ontology/MCCL"]
> NOTE: that the '.config' format requires spaces to be escaped with '\'
> Documentation of the Engine is available at http://stanbol.apache.org/docs/trunk/components/enhancer/engines/customnermodelengine.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira