You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@stanbol.apache.org by Wayne Rasmuss <wa...@perceptivesoftware.com> on 2012/09/27 18:17:59 UTC

Replace or augment UIMA/OpenNLP pipeline with Stanbol

I've been working with UIMA and OpenNLP together. Basically I've got the
OpenNLP/UIMA example working. This gives me annotated text with tokens,
sentences, parts of speech, chunks (verb phrase, noun phrase, etc.) It also
attempts organizations, dates and locations though I don't get reliable
results with them. Mostly I'm interested in parts of speech and chunks
anyway.

I've been looking around and Stanbol looks like it may be easier to deal
with and give me more advanced capabilities. I've done the first part of
the getting started guide, but not the "full" version. I got he web
interface up and was able to get some enhanced text. So that was great.

After that I'm kind of stumped. I would like to get the annotated text
(like I'm getting from UIMA/OpenNLP) so we can do analysis on it. Can
someone help get started with setting up/calling stanbol so I can get the
details in the enhanced result?

We're working with Groovy as our glue code. Bertrand provided me with this
example.https://gist.github.com/2931050 which looks very promising, I think
what I need to do is basically add OpenNLP enhancers here and figure out
how to call it.

Any help would be great!

Thanks
Wayne

--
Wayne Rasmuss
Software Engineer
Perceptive Software

wayne.rasmuss@perceptivesoftware.com<%2...@perceptivesoftware.com>
www.perceptivesoftware.com<http://www.google.com/url?q=http%3A%2F%2Fwww.perceptivesoftware.com%2F&sa=D&sntz=1&usg=AFrqEzcsJntxREaacJcFIcSk20jc4POZgw>

+1 913 422 7525 corporate
+1 913 667 6630 direct
+1 785 769 9545 mobile

Re: Replace or augment UIMA/OpenNLP pipeline with Stanbol

Posted by Bertrand Delacretaz <bd...@apache.org>.

On Fri, Sep 28, 2012 at 8:42 AM, Rupert Westenthaler
<ru...@gmail.com> wrote:
> ...AFAIK the Apache Camel example
> provided by Bertrand should allow you to call the according
> Engines/Chain and also support direct access to the results stored in
> the AnalyzedText content part...

Yes, that's the idea - build an enhancement chain as a scripted flow,
where intermediate results can influence what happens next. My
prototype at https://gist.github.com/2931050 does not even use actual
enhancement engines, it's just an expression of that idea so far.

The idea behind that is to have a completely stateless stanbol
enhancement engine, where a request would include the enhancement
chain flow script as well as the input to analyze. If this work this
would make it easier to use stanbol as a pure enhancement service
where different users can have independent enhancement chains without
having to configure anything on the stanbol instance, which also makes
load balancing and scaling easier.

-Bertrand (still suffering from chronical ENOTIME, not sure when I'll
have time to work on that)

Re: Replace or augment UIMA/OpenNLP pipeline with Stanbol

Posted by Rupert Westenthaler <ru...@gmail.com>.

Hi Wayne,

On Thu, Sep 27, 2012 at 6:17 PM, Wayne Rasmuss
<wa...@perceptivesoftware.com> wrote:
> I've been working with UIMA and OpenNLP together. Basically I've got the
> OpenNLP/UIMA example working. This gives me annotated text with tokens,
> sentences, parts of speech, chunks (verb phrase, noun phrase, etc.) It also
> attempts organizations, dates and locations though I don't get reliable
> results with them. Mostly I'm interested in parts of speech and chunks
> anyway.
>

Word level NLP annotations are currently not included in the
Enhancement Results. This is mainly because this would result in 20+
triples per word. However with STANBOL-733 "Stanbol NLP processing"
this feature will be added. Development of this is done in an own
branch [1]. This branch also includes an own Stanbol Launcher that
allows to easily test the current state of development (build and
start the launcher and than post some text to the
http://localhost:8080/enhancer/chain/nlp-processing)

I will give you a short overview. Details can be found in JIRA:

* AnalysedText: Java Domain Model that represents results of NLP. The
AnalysedText is added to the ContentItem as ContentPart (see
STANBOL-734 for code examples)
* NLP 2 RDF: This is an EnhancementEngine that converts the
information of the AnalysedText to RDF by using NIF (NLP Interchange
Format) - a set of OWL ontologies that allow to formally represent NLP
results (see STANBOL-741). NOTE that the NLP results provided by the
nlp-processing chain of the Stanbol Launcher do already use NIF
* The opennlp.pos EnhancementEngine supports POS tagging of parsed
texts in all languages supported by openNLP (STANBOL-735). As part of
that is also detects and adds Sentence annotations. The
opennlp.chunker EnhancementEngine consumes Tokens and POS tags and
performs chunking (STANBOL-736). Chunking is supported for English and
German. There is also a sentiment.wordclassifier EnhancementEngine
that adds sentiment tags on word level (based on SentiWordNet in
English and SentiWS for German).

You might also have a look at a presentation [2] about the Stanbol NLP
processing module I gave at the MOLDE workshop this week in Leipzig.

[1] http://svn.apache.org/repos/asf/stanbol/branches/stanbol-nlp-processing/
[2] http://stanbol.apache.org/presentations/Stanbol_NLP_processing_2012-09.pdf

> I've been looking around and Stanbol looks like it may be easier to deal
> with and give me more advanced capabilities. I've done the first part of
> the getting started guide, but not the "full" version. I got he web
> interface up and was able to get some enhanced text. So that was great.
>
> After that I'm kind of stumped. I would like to get the annotated text
> (like I'm getting from UIMA/OpenNLP) so we can do analysis on it. Can
> someone help get started with setting up/calling stanbol so I can get the
> details in the enhanced result?
>

If you want to stay with the RESTful service you will need to
implement against the NIF as generated by the "NLP2RDF" engine. If you
plan to access the StanbolEnhancer via its Java API I think that the
API of the AnalyzedText (STANBOL-734) should give you everything you
need.

You might also want to consider to implement your own analysis as
Stanbol EnhancementEngine. This blog [3] provides a good introduction
on how to do that.

[3] http://blog.iks-project.eu/getting-started-with-apache-stanbol-enhancement-engine/

>
> We're working with Groovy as our glue code. Bertrand provided me with this
> example.https://gist.github.com/2931050 which looks very promising, I think
> what I need to do is basically add OpenNLP enhancers here and figure out
> how to call it.
>

The "opennlp.pos" and "opennlp.cunker" Engines should exactly provide
the information you are looking for. AFAIK the Apache Camel example
provided by Bertrand should allow you to call the according
Engines/Chain and also support direct access to the results stored in
the AnalyzedText content part. But as I am not familiar with Camel it
would be good if Bertrand could confirm this.

Please NOTE that the Stanbol NLP processing is still in heavy
development. So things might still change. The current plan is to have
a first rather stable version of STANBOL-733  available in the trunk
by end of October.

best
Rupert

-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Replace or augment UIMA/OpenNLP pipeline with Stanbol

Posted by Mihály Héder <he...@gmail.com>.

Hi,

On 27 September 2012 18:17, Wayne Rasmuss <
wayne.rasmuss@perceptivesoftware.com> wrote:

> I've been working with UIMA and OpenNLP together. Basically I've got the
> OpenNLP/UIMA example working. This gives me annotated text with tokens,
> sentences, parts of speech, chunks (verb phrase, noun phrase, etc.) It also
> attempts organizations, dates and locations though I don't get reliable
> results with them. Mostly I'm interested in parts of speech and chunks
> anyway.
>
> I've been looking around and Stanbol looks like it may be easier to deal
> with and give me more advanced capabilities. I've done the first part of
> the getting started guide, but not the "full" version. I got he web
> interface up and was able to get some enhanced text. So that was great.
>
> After that I'm kind of stumped. I would like to get the annotated text
> (like I'm getting from UIMA/OpenNLP) so we can do analysis on it. Can
> someone help get started with setting up/calling stanbol so I can get the
> details in the enhanced result?
>

If you had an established UIMA infrastructure I would recommend the UIMA
integration tools for stanbol:
http://blog.iks-project.eu/tag/uima/

However, as you want to use OpenNLP specifically, the best approach is
indeed to use the native stanbol EE-s and replacing UIMA to Stanbol for the
management of the analysis chain.

Cheers
Mihály


>
> We're working with Groovy as our glue code. Bertrand provided me with this
> example.https://gist.github.com/2931050 which looks very promising, I
> think
> what I need to do is basically add OpenNLP enhancers here and figure out
> how to call it.
>
> Any help would be great!
>
> Thanks
> Wayne
>
> --
> Wayne Rasmuss
> Software Engineer
> Perceptive Software
>
> wayne.rasmuss@perceptivesoftware.com<%
> 20FIRST.LASTNAME@perceptivesoftware.com>
> www.perceptivesoftware.com<
> http://www.google.com/url?q=http%3A%2F%2Fwww.perceptivesoftware.com%2F&sa=D&sntz=1&usg=AFrqEzcsJntxREaacJcFIcSk20jc4POZgw
> >
>
> +1 913 422 7525 corporate
> +1 913 667 6630 direct
> +1 785 769 9545 mobile
>