You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by "Rupert Westenthaler (JIRA)" <ji...@apache.org> on 2012/10/18 07:44:03 UTC

[jira] [Commented] (STANBOL-739) Migrate the Celi Lemmatizer Engine to use the AnalyzedText contentPart

    [ https://issues.apache.org/jira/browse/STANBOL-739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478696#comment-13478696 ] 

Rupert Westenthaler commented on STANBOL-739:
---------------------------------------------

Big thanks for the patch and sorry for the delay, but I had to work on other Stanbol things. I have successfully applied your patch to the trunk and plan to work on it in the coming days. 

### Regarding Olia POS property (and other similar things):

I discussed this with Sebastian Hellmann already. The suggestion was to add AnnotationProperties to the String Ontology that do allow direct linking from a Word to the LexicalCategory. (e.g. "string:lecialCategroy" or "string:posClass").

Here the detailed description:

OWL ontologies can not link with properties to Classes (only instances). Because of that LexicalCategories are specified in OLIA as Classes while "Tag"s of POS TagSets are modelled as instances (of the POS classes). There exists the olialink property in the String ontology and this property can be used to link to the "Tag".

While such a link is nice when you assume that the consumer of the RDF graph does use and OWL reasoner with the OLIA-, String- and Mapping-Ontology for the used POS TagSet loaded it is not very meaningful for users that are missing this kind of Infrastructure.

Because of that I discussed with Sebastian Hellman the addition of an owl:AnnotationProperty to the String Ontology that will allow to link a Word directly with the POS Classes defined by OLIA (entries of the LexicalCategory enumeration). AnnotationPorperties can be used for such things as they MUST BE ignored by any OWL Reasoner.

### Regarding "LexicalCategory":

Probably I will add some additional Categories while adding support for the hierarchical structure define by the Ontology to the Enumeration (see the enumeration for Tenses as an example). An other possibility would be to define a second (hierarchical) Enumeration that with all POS tags defined by OLIA and map those to the currently defined in the LexicalCategory Enumeration. This would make it easier for Components where the granularity of the current LexicalCategories is sufficient.

best
Rupert
                
> Migrate the Celi Lemmatizer Engine to use the AnalyzedText contentPart
> ----------------------------------------------------------------------
>
>                 Key: STANBOL-739
>                 URL: https://issues.apache.org/jira/browse/STANBOL-739
>             Project: Stanbol
>          Issue Type: Sub-task
>            Reporter: Rupert Westenthaler
>         Attachments: myPatch.diff
>
>
> The CELI Lemmatizer enhancement engine currently writes its results directly to the metadata of the ContentItem. As the new AnalyzedText content part is much better suited to represent those data this Engine should be adopted to use the new content part.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira