You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@stanbol.apache.org by "Sebastian Schaffert (JIRA)" <ji...@apache.org> on 2012/09/17 16:39:07 UTC

[jira] [Updated] (STANBOL-733) Stanbol NLP processing

     [ https://issues.apache.org/jira/browse/STANBOL-733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebastian Schaffert updated STANBOL-733:
----------------------------------------

    Attachment: srfgkmt-stanbol-nlp.zip

A patch containing NLP enhancement engines for Apache Stanbol addressing the goals mentioned in the issue. This excludes all data files, they can be found at https://www.dropbox.com/home/Public/stanbol
                
> Stanbol NLP processing
> ----------------------
>
>                 Key: STANBOL-733
>                 URL: https://issues.apache.org/jira/browse/STANBOL-733
>             Project: Stanbol
>          Issue Type: New Feature
>          Components: Enhancer
>            Reporter: Rupert Westenthaler
>         Attachments: srfgkmt-stanbol-nlp.zip
>
>
> This issue covers the NLP processing components as discussed in http://markmail.org/message/qxusiup3mim2lhpx
> Goals
> =====
> 1. provide a modular infrastructure for NLP-related things
> Many tasks in NLP can be computationally intensive, and there is no "one fits
> all" NLP approach when analysing text. Therefore, we wanted to have a NLP
> infrastructure that can be configured and wired together as needed for the
> specific use case, with several specialised modules that can build upon each
> other but many of which are optional. 
> 2. provide a unified data model for representing NLP text annotations
> In many szenarios, it will be necessary to implement custom engines building on
> the results of a previous "generic" analysis of the text (e.g. POS tagging and
> chunking). For example, in a project we are identifying so-called "noun
> phrases", use a lemmatizer to build the ground form, then convert this to
> singular nominative form to have a gramatically correct label to use in a tag
> cloud. Most of this builds on generic NLP functionality, but the last step is
> very specific to the use case.
> Therefore, we wanted also to implement a generic NLP data model that allows
> representing text annotations attached to individual words or also to spans of
> words.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira