You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@stanbol.apache.org by Erik Antelman <ea...@gmail.com> on 2013/07/23 19:51:36 UTC

Entity Tagging by RE pattern

I know several pipelines (GATE and several commercial NLP pipelines)
annotate dates, measurements, address etc
http://gate.ac.uk/sale/tao/splitap6.html#x36-736000F.7.

I would like to have the same type of rule/re pattern annotation capability
in a Stanbol chain. I would rather not just throw in GATE, but am thinking
that perhaps the regex primitive in the Jena rule engine or using
constructive SPARQL statements could achieve the same capability with
existing core components.

Obviously such an engine could utilize language/locale data to select
pattern variants and to some degree the POS annotations.

What ideas or feedback do people have on this?

for(;;); /* eantelman@gmail.com */

Re: Entity Tagging by RE pattern

Posted by Rupert Westenthaler <ru...@gmail.com>.
Hi Erik, all,

not to forget the UIMA Regex Annotator [1].

Currently there is no Stanbol EnhancementEngine that supports pattern
based feature extraction (e.g. by using Regex). A generic
EnhancementEngine similar to the UIMA Regex Annotator would be
definitely well received addition to Apache Stanbol.

A configuration would need to include two things

(1) the pattern (e.g. regex) used to detect and extract features from the text
(2) the annotation: how features extracted by the pattern should be
represented as RDF (e.g. Sparql construct part that gets the data from
the pattern instead of the select part of the query)

WDYT
Rupert


[1] http://uima.apache.org/downloads/sandbox/RegexAnnotatorUserGuide/RegexAnnotatorUserGuide.html

On Tue, Jul 23, 2013 at 7:51 PM, Erik Antelman <ea...@gmail.com> wrote:
> I know several pipelines (GATE and several commercial NLP pipelines)
> annotate dates, measurements, address etc
> http://gate.ac.uk/sale/tao/splitap6.html#x36-736000F.7.
>
> I would like to have the same type of rule/re pattern annotation capability
> in a Stanbol chain. I would rather not just throw in GATE, but am thinking
> that perhaps the regex primitive in the Jena rule engine or using
> constructive SPARQL statements could achieve the same capability with
> existing core components.
>
> Obviously such an engine could utilize language/locale data to select
> pattern variants and to some degree the POS annotations.
>
> What ideas or feedback do people have on this?
>
> for(;;); /* eantelman@gmail.com */



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen