You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by Rui Lopes <rl...@ipb.pt.INVALID> on 2016/04/19 16:07:12 UTC

UIMA FIT Pipeline with OpenNLP tokeniser

Hi all,

I’m trying to use OpenNLP uima to build a very simple pipeline:

CollectionReaderDescription reader = CollectionReaderFactory
				.createReaderDescription(AbstractCollectionReader.class, AbstractCollectionReader.PARAM_VALUE, 33);

AnalysisEngineDescription tokenizer = AnalysisEngineFactory.createEngineDescription(SimpleTokenizer.class,
				"opennlp.uima.SentenceType", "pt.ipb.pos.type.Sentence", "opennlp.uima.TokenType",
				"pt.ipb.pos.type.Token");


AnalysisEngineDescription ae = AnalysisEngineFactory.createEngineDescription(GetStartedQuickAE.class);

SimplePipeline.runPipeline(reader, tokenizer, ae);


------
The GetStartedQuickAE just prints the Annotations:

	@Override
	public void process(JCas jCas) throws AnalysisEngineProcessException {
		System.out.println(jCas.getDocumentText());
		
		for(Annotation a : jCas.getAnnotationIndex()) {
			System.out.println(a);
		}
		
		System.out.println("Done");
		

	}


———
The output is:


Apr 19, 2016 3:04:46 PM opennlp.uima.tokenize.AbstractTokenizer initialize(71)
INFO: Initializing the OpenNLP Simple Tokenizer annotator.
Apr 19, 2016 3:04:46 PM opennlp.uima.util.AnnotatorUtil getOptionalParameter(440)
INFO: opennlp.uima.IsRemoveExistingAnnotations = not set
Apr 19, 2016 3:04:46 PM opennlp.uima.util.AnnotatorUtil getOptionalParameter(440)
INFO: opennlp.uima.SentenceType = pt.ipb.pos.type.Sentence
Apr 19, 2016 3:04:46 PM opennlp.uima.util.AnnotatorUtil getOptionalParameter(440)
INFO: opennlp.uima.TokenType = pt.ipb.pos.type.Token
This article aims to observe the didactic action and its epistemological insertion in education trends as well as its role as a medium capable of causing changes in this alignment. Main objective is the need to consciously integrate between epistemology and education trends didactic application. The methodological procedure trend the application relied on observations from years in which the subjects were given Cytology and Histology in undergraduate courses. The results of observations point to a single procedure, with little clarity regarding the alignment epistemology, educational trends, teaching action. Associate art practice can provide a biological alternative capable of generating a position and "profitable shifts" in epistemological and pedagogical articulating. Different strategies need to be created to establish conditions that allow the configuration of knowledge as a whole, while respecting cultural diversity in which knowledge is configured.
DocumentAnnotation
   sofa: _InitialView
   begin: 0
   end: 969
   language: "x-unspecified"

Done


There is only one Annotation? Does anyone knows why?

Thanks for any feedback!

All the best,

Rui Lopes


Re: UIMA FIT Pipeline with OpenNLP tokeniser

Posted by Rui Lopes <xi...@gmail.com>.
That did it!
Easier than I thought!

Thanks,

/rp

> On 20 Apr 2016, at 09:36, Richard Eckart de Castilho <re...@apache.org> wrote:
> 
> You can also configure your project such that uimaFIT automatically finds your typesystem - then do you not have to pass it everywhere.
> 
> See: https://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#ugr.tools.uimafit.typesystem
> 
> Cheers,
> 
> -- Richard
> 
>> On 20.04.2016, at 10:13, Rui Lopes <rl...@ipb.pt.INVALID> wrote:
>> 
>> Success!!! That solved it.
>> 
>> I had only tried to pass ts to the CollectionReader, but it is necessary to pass it to all components.
>> 
>> Thanks a lot!
>> 
>> Cheers,
>> 
>> /rp
> 


Re: UIMA FIT Pipeline with OpenNLP tokeniser

Posted by Richard Eckart de Castilho <re...@apache.org>.
You can also configure your project such that uimaFIT automatically finds your typesystem - then do you not have to pass it everywhere.

See: https://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#ugr.tools.uimafit.typesystem

Cheers,

-- Richard

> On 20.04.2016, at 10:13, Rui Lopes <rl...@ipb.pt.INVALID> wrote:
> 
> Success!!! That solved it.
> 
> I had only tried to pass ts to the CollectionReader, but it is necessary to pass it to all components.
> 
> Thanks a lot!
> 
> Cheers,
> 
> /rp


Re: UIMA FIT Pipeline with OpenNLP tokeniser

Posted by Rui Lopes <rl...@ipb.pt.INVALID>.
Success!!! That solved it.

I had only tried to pass ts to the CollectionReader, but it is necessary to pass it to all components.

Thanks a lot!

Cheers,

/rp


> On 20 Apr 2016, at 07:23, Richard Eckart de Castilho <re...@apache.org> wrote:
> 
> Hi,
> 
> you can load the type system first:
> 
> 	TypeSystemDescription ts = TypeSystemDescriptionFactory.createTypeSystemDescriptionFromPath("http://svn.apache.org/repos/asf/opennlp/tags/opennlp-1.6.0-rc6/opennlp-uima/descriptors/TypeSystem.xml");
> 
> Then pass "ts" as the second argument to all the create* calls, just after the class name, e.g.:
> 
> CollectionReaderDescription reader = CollectionReaderFactory
> 				.createReaderDescription(AbstractCollectionReader.class, ts, AbstractCollectionReader.PARAM_VALUE, 33);
> 
> Cheers,
> 
> -- Richard
> 
>> On 19.04.2016, at 23:13, Rui Lopes <rl...@ipb.pt.INVALID> wrote:
>> 
>> The issue is that I would like to use a CollectionReader instead of creating the document. Something like this:
>> 
>> 	private static void mine() throws UIMAException, IOException {
>> 
>> 		CollectionReaderDescription reader = CollectionReaderFactory
>> 				.createReaderDescription(AbstractCollectionReader.class, AbstractCollectionReader.PARAM_VALUE, 33);
>> 
>> 		AnalysisEngineDescription tokenizer = AnalysisEngineFactory.createEngineDescription(SimpleTokenizer.class,
>> 				UimaUtil.TOKEN_TYPE_PARAMETER, "pt.ipb.pos.type.Token", UimaUtil.SENTENCE_TYPE_PARAMETER,
>> 				"pt.ipb.pos.type.Sentence");
>> 
>> 		// AnalysisEngineDescription histogramer = AnalysisEngineFactory.createEngineDescription(HistogramAnnotator.class);
>> 
>> 		AnalysisEngineDescription ae = AnalysisEngineFactory.createEngineDescription(GetStartedQuickAE.class);
>> 
>> 		SimplePipeline.runPipeline(reader, tokenizer, ae);
>> 
>> 	}
>> 
>> 
>> Should I initialise the type system to the OpenNLP one? How can I do that?
>> 
>> It is possible to use a custom type system as in the code above (“pt.ipb.pos.type.Token”)? How?
>> 
>> Sorry about this probably naive questions, but I confess I must be missing something basic…
>> 
>> Cheers,
>> 
>> /rp
> 


Re: UIMA FIT Pipeline with OpenNLP tokeniser

Posted by Richard Eckart de Castilho <re...@apache.org>.
Hi,

you can load the type system first:

	TypeSystemDescription ts = TypeSystemDescriptionFactory.createTypeSystemDescriptionFromPath("http://svn.apache.org/repos/asf/opennlp/tags/opennlp-1.6.0-rc6/opennlp-uima/descriptors/TypeSystem.xml");

Then pass "ts" as the second argument to all the create* calls, just after the class name, e.g.:

CollectionReaderDescription reader = CollectionReaderFactory
				.createReaderDescription(AbstractCollectionReader.class, ts, AbstractCollectionReader.PARAM_VALUE, 33);

Cheers,

-- Richard

> On 19.04.2016, at 23:13, Rui Lopes <rl...@ipb.pt.INVALID> wrote:
> 
> The issue is that I would like to use a CollectionReader instead of creating the document. Something like this:
> 
> 	private static void mine() throws UIMAException, IOException {
> 
> 		CollectionReaderDescription reader = CollectionReaderFactory
> 				.createReaderDescription(AbstractCollectionReader.class, AbstractCollectionReader.PARAM_VALUE, 33);
> 
> 		AnalysisEngineDescription tokenizer = AnalysisEngineFactory.createEngineDescription(SimpleTokenizer.class,
> 				UimaUtil.TOKEN_TYPE_PARAMETER, "pt.ipb.pos.type.Token", UimaUtil.SENTENCE_TYPE_PARAMETER,
> 				"pt.ipb.pos.type.Sentence");
> 
> 		// AnalysisEngineDescription histogramer = AnalysisEngineFactory.createEngineDescription(HistogramAnnotator.class);
> 
> 		AnalysisEngineDescription ae = AnalysisEngineFactory.createEngineDescription(GetStartedQuickAE.class);
> 
> 		SimplePipeline.runPipeline(reader, tokenizer, ae);
> 
> 	}
> 
> 
> Should I initialise the type system to the OpenNLP one? How can I do that?
> 
> It is possible to use a custom type system as in the code above (“pt.ipb.pos.type.Token”)? How?
> 
> Sorry about this probably naive questions, but I confess I must be missing something basic…
> 
> Cheers,
> 
> /rp


Re: UIMA FIT Pipeline with OpenNLP tokeniser

Posted by Rui Lopes <rl...@ipb.pt.INVALID>.
I’m sorry, but I’m a little lost…

I tried the example you suggested, after adapting it to Java and it works beautifully:

	private static void fromGroovy() throws UIMAException {
		
		// Create document to be analyzed
		JCas document = JCasFactory.createJCasFromPath(
				"http://svn.apache.org/repos/asf/opennlp/tags/opennlp-1.6.0-rc6/opennlp-uima/descriptors/TypeSystem.xml");

		document.setDocumentText("The quick brown fox jumps over the lazy dog. Later, he jumped over the moon.");
		document.setDocumentLanguage("en");

		Type tokenType = document.getTypeSystem().getType("opennlp.uima.Token");
		Type sentenceType = document.getTypeSystem().getType("opennlp.uima.Sentence");
		Feature posFeature = tokenType.getFeatureByBaseName("pos");
		
		System.out.println(sentenceType.getName());

		AnalysisEngineDescription sentenceDetector = AnalysisEngineFactory.createEngineDescription(
				SentenceDetector.class, UimaUtil.SENTENCE_TYPE_PARAMETER, sentenceType.getName());

		// Configure sentence detector
		ExternalResourceFactory.createDependencyAndBind(sentenceDetector, UimaUtil.MODEL_PARAMETER,
				SentenceModelResourceImpl.class, "http://opennlp.sourceforge.net/models-1.5/en-sent.bin");

		// Configure tokenizer
		AnalysisEngineDescription tokenizer = AnalysisEngineFactory.createEngineDescription(Tokenizer.class,
				UimaUtil.TOKEN_TYPE_PARAMETER, tokenType.getName(), UimaUtil.SENTENCE_TYPE_PARAMETER,
				sentenceType.getName());

		ExternalResourceFactory.createDependencyAndBind(tokenizer, UimaUtil.MODEL_PARAMETER,
				TokenizerModelResourceImpl.class, "http://opennlp.sourceforge.net/models-1.5/en-token.bin");

		// Configure part-of-speech tagger
		AnalysisEngineDescription posTagger = AnalysisEngineFactory.createEngineDescription(POSTagger.class,
				UimaUtil.TOKEN_TYPE_PARAMETER, tokenType.getName(), UimaUtil.SENTENCE_TYPE_PARAMETER,
				sentenceType.getName(), UimaUtil.POS_FEATURE_PARAMETER, posFeature.getShortName());

		ExternalResourceFactory.createDependencyAndBind(posTagger, UimaUtil.MODEL_PARAMETER, POSModelResourceImpl.class,
				"http://opennlp.sourceforge.net/models-1.5/en-pos-perceptron.bin");

		// Run pipeline
		SimplePipeline.runPipeline(document, sentenceDetector, tokenizer, posTagger);

		// Display results
		for (AnnotationFS sentence : CasUtil.select(document.getCas(), sentenceType)) {
			for (AnnotationFS token : CasUtil.selectCovered(tokenType, sentence)) {
				System.out.println(token.getCoveredText() + " " + token.getFeatureValueAsString(posFeature));
			}
		}

	}



The issue is that I would like to use a CollectionReader instead of creating the document. Something like this:

	private static void mine() throws UIMAException, IOException {

		CollectionReaderDescription reader = CollectionReaderFactory
				.createReaderDescription(AbstractCollectionReader.class, AbstractCollectionReader.PARAM_VALUE, 33);

		AnalysisEngineDescription tokenizer = AnalysisEngineFactory.createEngineDescription(SimpleTokenizer.class,
				UimaUtil.TOKEN_TYPE_PARAMETER, "pt.ipb.pos.type.Token", UimaUtil.SENTENCE_TYPE_PARAMETER,
				"pt.ipb.pos.type.Sentence");

		// AnalysisEngineDescription histogramer = AnalysisEngineFactory.createEngineDescription(HistogramAnnotator.class);

		AnalysisEngineDescription ae = AnalysisEngineFactory.createEngineDescription(GetStartedQuickAE.class);

		SimplePipeline.runPipeline(reader, tokenizer, ae);

	}


Should I initialise the type system to the OpenNLP one? How can I do that?

It is possible to use a custom type system as in the code above (“pt.ipb.pos.type.Token”)? How?

Sorry about this probably naive questions, but I confess I must be missing something basic…

Cheers,

/rp



> On 19 Apr 2016, at 15:18, Richard Eckart de Castilho <re...@apache.org> wrote:
> 
> Short answer: no :)
> 
> Longer answer: You don't seem to be using the actual OpenNLP UIMA components.
> 
> If you want an example (in Groovy, but should be trivial to transfer to Java)
> on how to use the OpenNLP UIMA components with uimaFIT, see here:
> 
>  https://cwiki.apache.org/confluence/display/UIMA/uimaFIT+and+Groovy
> 
> Cheers,
> 
> -- Richard
> 


Re: UIMA FIT Pipeline with OpenNLP tokeniser

Posted by Rui Lopes <rl...@ipb.pt.INVALID>.
Hi Raj,

Got it working now!
After trying with success with the OpenNLP type system, I switched to mine and it worked as you suggested.

Moreover, the SentenceDetector advice was also valuable!

Thanks a lot!

Cheers,

/rp


> On 20 Apr 2016, at 06:03, Raj kiran <ra...@gmail.com> wrote:
> 
> Sorry i thought you already added the new types. you can add your custom
> type by defining your own type system. Its actually simple, see the
> following link for details
> https://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#ugr.tools.uimafit.typesystem
> 
> 
> Basically you have to add types.txt (containing location of type system
> xmls) . you can refer OpenNLP type system xml for adding new types for
> sentence and token. for example
> <typeDescription>
> <name>pt.ipb.pos.type.Token</name>
> <supertypeName>uima.tcas.Annotation
> </supertypeName>
> <features>
> <featureDescription>
> <name>pos</name>
> <description>Part of speech</description>
> <rangeTypeName>uima.cas.String</rangeTypeName>
> </featureDescription>
> </features>
> </typeDescription>
> 
> Also, In case of missing type some exception should have been thrown. So,
> you may have to check your collection reader code. A sample collection
> reader is available in uima fit examples in source. You can start with
> document approach and once everything is working you can test collection
> reader approach.
> 
> 
> Regards,
> Raj
> 
> 
> On Wed, Apr 20, 2016 at 2:45 AM, Rui Lopes <rl...@ipb.pt.invalid> wrote:
> 
>> Thank you, Raj!
>> 
>> I tried it but no success… the Annotations keep being only one.
>> Should it be related to the type system?
>> 
>> Cheers,
>> 
>> /rp
>> 
>> 
>>> On 19 Apr 2016, at 17:38, Raj kiran <ra...@gmail.com> wrote:
>>> 
>>> I believe you are missing the SentenceDetector engine in the pipeline .
>> It
>>> should be added before SimpleTokenizer .
>>> 
>>> SimpleTokenizer iterates over sentences in the text/document and in
>> absence
>>> of sentence annotation, tokenizer fails to add any tokens to cas.
>>> 
>>> Hope it helps.
>>> 
>>> Regards,
>>> Raj
>>> 
>>> On Tue, Apr 19, 2016 at 7:48 PM, Richard Eckart de Castilho <
>> rec@apache.org>
>>> wrote:
>>> 
>>>> Short answer: no :)
>>>> 
>>>> Longer answer: You don't seem to be using the actual OpenNLP UIMA
>>>> components.
>>>> 
>>>> If you want an example (in Groovy, but should be trivial to transfer to
>>>> Java)
>>>> on how to use the OpenNLP UIMA components with uimaFIT, see here:
>>>> 
>>>> https://cwiki.apache.org/confluence/display/UIMA/uimaFIT+and+Groovy
>>>> 
>>>> Cheers,
>>>> 
>>>> -- Richard
>>>> 
>>>>> On 19.04.2016, at 16:07, Rui Lopes <rl...@ipb.pt.INVALID> wrote:
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I’m trying to use OpenNLP uima to build a very simple pipeline:
>>>>> 
>>>>> CollectionReaderDescription reader = CollectionReaderFactory
>>>>> 
>>>> .createReaderDescription(AbstractCollectionReader.class,
>>>> AbstractCollectionReader.PARAM_VALUE, 33);
>>>>> 
>>>>> AnalysisEngineDescription tokenizer =
>>>> AnalysisEngineFactory.createEngineDescription(SimpleTokenizer.class,
>>>>>                             "opennlp.uima.SentenceType",
>>>> "pt.ipb.pos.type.Sentence", "opennlp.uima.TokenType",
>>>>>                             "pt.ipb.pos.type.Token");
>>>>> 
>>>>> 
>>>>> AnalysisEngineDescription ae =
>>>> AnalysisEngineFactory.createEngineDescription(GetStartedQuickAE.class);
>>>>> 
>>>>> SimplePipeline.runPipeline(reader, tokenizer, ae);
>>>>> 
>>>>> 
>>>>> ------
>>>>> The GetStartedQuickAE just prints the Annotations:
>>>>> 
>>>>>     @Override
>>>>>     public void process(JCas jCas) throws
>>>> AnalysisEngineProcessException {
>>>>>             System.out.println(jCas.getDocumentText());
>>>>> 
>>>>>             for(Annotation a : jCas.getAnnotationIndex()) {
>>>>>                     System.out.println(a);
>>>>>             }
>>>>> 
>>>>>             System.out.println("Done");
>>>>> 
>>>>> 
>>>>>     }
>>>>> 
>>>>> 
>>>>> ———
>>>>> The output is:
>>>>> 
>>>>> 
>>>>> Apr 19, 2016 3:04:46 PM opennlp.uima.tokenize.AbstractTokenizer
>>>> initialize(71)
>>>>> INFO: Initializing the OpenNLP Simple Tokenizer annotator.
>>>>> Apr 19, 2016 3:04:46 PM opennlp.uima.util.AnnotatorUtil
>>>> getOptionalParameter(440)
>>>>> INFO: opennlp.uima.IsRemoveExistingAnnotations = not set
>>>>> Apr 19, 2016 3:04:46 PM opennlp.uima.util.AnnotatorUtil
>>>> getOptionalParameter(440)
>>>>> INFO: opennlp.uima.SentenceType = pt.ipb.pos.type.Sentence
>>>>> Apr 19, 2016 3:04:46 PM opennlp.uima.util.AnnotatorUtil
>>>> getOptionalParameter(440)
>>>>> INFO: opennlp.uima.TokenType = pt.ipb.pos.type.Token
>>>>> This article aims to observe the didactic action and its
>> epistemological
>>>> insertion in education trends as well as its role as a medium capable of
>>>> causing changes in this alignment. Main objective is the need to
>>>> consciously integrate between epistemology and education trends didactic
>>>> application. The methodological procedure trend the application relied
>> on
>>>> observations from years in which the subjects were given Cytology and
>>>> Histology in undergraduate courses. The results of observations point
>> to a
>>>> single procedure, with little clarity regarding the alignment
>> epistemology,
>>>> educational trends, teaching action. Associate art practice can provide
>> a
>>>> biological alternative capable of generating a position and "profitable
>>>> shifts" in epistemological and pedagogical articulating. Different
>>>> strategies need to be created to establish conditions that allow the
>>>> configuration of knowledge as a whole, while respecting cultural
>> diversity
>>>> in which knowledge is configured.
>>>>> DocumentAnnotation
>>>>> sofa: _InitialView
>>>>> begin: 0
>>>>> end: 969
>>>>> language: "x-unspecified"
>>>>> 
>>>>> Done
>>>>> 
>>>>> 
>>>>> There is only one Annotation? Does anyone knows why?
>>>>> 
>>>>> Thanks for any feedback!
>>>>> 
>>>>> All the best,
>>>>> 
>>>>> Rui Lopes
>>>>> 
>>>> 
>>>> 
>> 
>> 


Re: UIMA FIT Pipeline with OpenNLP tokeniser

Posted by Raj kiran <ra...@gmail.com>.
Sorry i thought you already added the new types. you can add your custom
type by defining your own type system. Its actually simple, see the
following link for details
https://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#ugr.tools.uimafit.typesystem


Basically you have to add types.txt (containing location of type system
xmls) . you can refer OpenNLP type system xml for adding new types for
sentence and token. for example
<typeDescription>
<name>pt.ipb.pos.type.Token</name>
<supertypeName>uima.tcas.Annotation
</supertypeName>
<features>
<featureDescription>
<name>pos</name>
<description>Part of speech</description>
<rangeTypeName>uima.cas.String</rangeTypeName>
</featureDescription>
</features>
</typeDescription>

Also, In case of missing type some exception should have been thrown. So,
you may have to check your collection reader code. A sample collection
reader is available in uima fit examples in source. You can start with
document approach and once everything is working you can test collection
reader approach.


Regards,
Raj


On Wed, Apr 20, 2016 at 2:45 AM, Rui Lopes <rl...@ipb.pt.invalid> wrote:

> Thank you, Raj!
>
> I tried it but no success… the Annotations keep being only one.
> Should it be related to the type system?
>
> Cheers,
>
> /rp
>
>
> > On 19 Apr 2016, at 17:38, Raj kiran <ra...@gmail.com> wrote:
> >
> > I believe you are missing the SentenceDetector engine in the pipeline .
> It
> > should be added before SimpleTokenizer .
> >
> > SimpleTokenizer iterates over sentences in the text/document and in
> absence
> > of sentence annotation, tokenizer fails to add any tokens to cas.
> >
> > Hope it helps.
> >
> > Regards,
> > Raj
> >
> > On Tue, Apr 19, 2016 at 7:48 PM, Richard Eckart de Castilho <
> rec@apache.org>
> > wrote:
> >
> >> Short answer: no :)
> >>
> >> Longer answer: You don't seem to be using the actual OpenNLP UIMA
> >> components.
> >>
> >> If you want an example (in Groovy, but should be trivial to transfer to
> >> Java)
> >> on how to use the OpenNLP UIMA components with uimaFIT, see here:
> >>
> >>  https://cwiki.apache.org/confluence/display/UIMA/uimaFIT+and+Groovy
> >>
> >> Cheers,
> >>
> >> -- Richard
> >>
> >>> On 19.04.2016, at 16:07, Rui Lopes <rl...@ipb.pt.INVALID> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> I’m trying to use OpenNLP uima to build a very simple pipeline:
> >>>
> >>> CollectionReaderDescription reader = CollectionReaderFactory
> >>>
> >> .createReaderDescription(AbstractCollectionReader.class,
> >> AbstractCollectionReader.PARAM_VALUE, 33);
> >>>
> >>> AnalysisEngineDescription tokenizer =
> >> AnalysisEngineFactory.createEngineDescription(SimpleTokenizer.class,
> >>>                              "opennlp.uima.SentenceType",
> >> "pt.ipb.pos.type.Sentence", "opennlp.uima.TokenType",
> >>>                              "pt.ipb.pos.type.Token");
> >>>
> >>>
> >>> AnalysisEngineDescription ae =
> >> AnalysisEngineFactory.createEngineDescription(GetStartedQuickAE.class);
> >>>
> >>> SimplePipeline.runPipeline(reader, tokenizer, ae);
> >>>
> >>>
> >>> ------
> >>> The GetStartedQuickAE just prints the Annotations:
> >>>
> >>>      @Override
> >>>      public void process(JCas jCas) throws
> >> AnalysisEngineProcessException {
> >>>              System.out.println(jCas.getDocumentText());
> >>>
> >>>              for(Annotation a : jCas.getAnnotationIndex()) {
> >>>                      System.out.println(a);
> >>>              }
> >>>
> >>>              System.out.println("Done");
> >>>
> >>>
> >>>      }
> >>>
> >>>
> >>> ———
> >>> The output is:
> >>>
> >>>
> >>> Apr 19, 2016 3:04:46 PM opennlp.uima.tokenize.AbstractTokenizer
> >> initialize(71)
> >>> INFO: Initializing the OpenNLP Simple Tokenizer annotator.
> >>> Apr 19, 2016 3:04:46 PM opennlp.uima.util.AnnotatorUtil
> >> getOptionalParameter(440)
> >>> INFO: opennlp.uima.IsRemoveExistingAnnotations = not set
> >>> Apr 19, 2016 3:04:46 PM opennlp.uima.util.AnnotatorUtil
> >> getOptionalParameter(440)
> >>> INFO: opennlp.uima.SentenceType = pt.ipb.pos.type.Sentence
> >>> Apr 19, 2016 3:04:46 PM opennlp.uima.util.AnnotatorUtil
> >> getOptionalParameter(440)
> >>> INFO: opennlp.uima.TokenType = pt.ipb.pos.type.Token
> >>> This article aims to observe the didactic action and its
> epistemological
> >> insertion in education trends as well as its role as a medium capable of
> >> causing changes in this alignment. Main objective is the need to
> >> consciously integrate between epistemology and education trends didactic
> >> application. The methodological procedure trend the application relied
> on
> >> observations from years in which the subjects were given Cytology and
> >> Histology in undergraduate courses. The results of observations point
> to a
> >> single procedure, with little clarity regarding the alignment
> epistemology,
> >> educational trends, teaching action. Associate art practice can provide
> a
> >> biological alternative capable of generating a position and "profitable
> >> shifts" in epistemological and pedagogical articulating. Different
> >> strategies need to be created to establish conditions that allow the
> >> configuration of knowledge as a whole, while respecting cultural
> diversity
> >> in which knowledge is configured.
> >>> DocumentAnnotation
> >>>  sofa: _InitialView
> >>>  begin: 0
> >>>  end: 969
> >>>  language: "x-unspecified"
> >>>
> >>> Done
> >>>
> >>>
> >>> There is only one Annotation? Does anyone knows why?
> >>>
> >>> Thanks for any feedback!
> >>>
> >>> All the best,
> >>>
> >>> Rui Lopes
> >>>
> >>
> >>
>
>

Re: UIMA FIT Pipeline with OpenNLP tokeniser

Posted by Rui Lopes <rl...@ipb.pt.INVALID>.
Thank you, Raj!

I tried it but no success… the Annotations keep being only one.
Should it be related to the type system?

Cheers,

/rp


> On 19 Apr 2016, at 17:38, Raj kiran <ra...@gmail.com> wrote:
> 
> I believe you are missing the SentenceDetector engine in the pipeline . It
> should be added before SimpleTokenizer .
> 
> SimpleTokenizer iterates over sentences in the text/document and in absence
> of sentence annotation, tokenizer fails to add any tokens to cas.
> 
> Hope it helps.
> 
> Regards,
> Raj
> 
> On Tue, Apr 19, 2016 at 7:48 PM, Richard Eckart de Castilho <re...@apache.org>
> wrote:
> 
>> Short answer: no :)
>> 
>> Longer answer: You don't seem to be using the actual OpenNLP UIMA
>> components.
>> 
>> If you want an example (in Groovy, but should be trivial to transfer to
>> Java)
>> on how to use the OpenNLP UIMA components with uimaFIT, see here:
>> 
>>  https://cwiki.apache.org/confluence/display/UIMA/uimaFIT+and+Groovy
>> 
>> Cheers,
>> 
>> -- Richard
>> 
>>> On 19.04.2016, at 16:07, Rui Lopes <rl...@ipb.pt.INVALID> wrote:
>>> 
>>> Hi all,
>>> 
>>> I’m trying to use OpenNLP uima to build a very simple pipeline:
>>> 
>>> CollectionReaderDescription reader = CollectionReaderFactory
>>> 
>> .createReaderDescription(AbstractCollectionReader.class,
>> AbstractCollectionReader.PARAM_VALUE, 33);
>>> 
>>> AnalysisEngineDescription tokenizer =
>> AnalysisEngineFactory.createEngineDescription(SimpleTokenizer.class,
>>>                              "opennlp.uima.SentenceType",
>> "pt.ipb.pos.type.Sentence", "opennlp.uima.TokenType",
>>>                              "pt.ipb.pos.type.Token");
>>> 
>>> 
>>> AnalysisEngineDescription ae =
>> AnalysisEngineFactory.createEngineDescription(GetStartedQuickAE.class);
>>> 
>>> SimplePipeline.runPipeline(reader, tokenizer, ae);
>>> 
>>> 
>>> ------
>>> The GetStartedQuickAE just prints the Annotations:
>>> 
>>>      @Override
>>>      public void process(JCas jCas) throws
>> AnalysisEngineProcessException {
>>>              System.out.println(jCas.getDocumentText());
>>> 
>>>              for(Annotation a : jCas.getAnnotationIndex()) {
>>>                      System.out.println(a);
>>>              }
>>> 
>>>              System.out.println("Done");
>>> 
>>> 
>>>      }
>>> 
>>> 
>>> ———
>>> The output is:
>>> 
>>> 
>>> Apr 19, 2016 3:04:46 PM opennlp.uima.tokenize.AbstractTokenizer
>> initialize(71)
>>> INFO: Initializing the OpenNLP Simple Tokenizer annotator.
>>> Apr 19, 2016 3:04:46 PM opennlp.uima.util.AnnotatorUtil
>> getOptionalParameter(440)
>>> INFO: opennlp.uima.IsRemoveExistingAnnotations = not set
>>> Apr 19, 2016 3:04:46 PM opennlp.uima.util.AnnotatorUtil
>> getOptionalParameter(440)
>>> INFO: opennlp.uima.SentenceType = pt.ipb.pos.type.Sentence
>>> Apr 19, 2016 3:04:46 PM opennlp.uima.util.AnnotatorUtil
>> getOptionalParameter(440)
>>> INFO: opennlp.uima.TokenType = pt.ipb.pos.type.Token
>>> This article aims to observe the didactic action and its epistemological
>> insertion in education trends as well as its role as a medium capable of
>> causing changes in this alignment. Main objective is the need to
>> consciously integrate between epistemology and education trends didactic
>> application. The methodological procedure trend the application relied on
>> observations from years in which the subjects were given Cytology and
>> Histology in undergraduate courses. The results of observations point to a
>> single procedure, with little clarity regarding the alignment epistemology,
>> educational trends, teaching action. Associate art practice can provide a
>> biological alternative capable of generating a position and "profitable
>> shifts" in epistemological and pedagogical articulating. Different
>> strategies need to be created to establish conditions that allow the
>> configuration of knowledge as a whole, while respecting cultural diversity
>> in which knowledge is configured.
>>> DocumentAnnotation
>>>  sofa: _InitialView
>>>  begin: 0
>>>  end: 969
>>>  language: "x-unspecified"
>>> 
>>> Done
>>> 
>>> 
>>> There is only one Annotation? Does anyone knows why?
>>> 
>>> Thanks for any feedback!
>>> 
>>> All the best,
>>> 
>>> Rui Lopes
>>> 
>> 
>> 


Re: UIMA FIT Pipeline with OpenNLP tokeniser

Posted by Raj kiran <ra...@gmail.com>.
I believe you are missing the SentenceDetector engine in the pipeline . It
should be added before SimpleTokenizer .

SimpleTokenizer iterates over sentences in the text/document and in absence
of sentence annotation, tokenizer fails to add any tokens to cas.

Hope it helps.

Regards,
Raj

On Tue, Apr 19, 2016 at 7:48 PM, Richard Eckart de Castilho <re...@apache.org>
wrote:

> Short answer: no :)
>
> Longer answer: You don't seem to be using the actual OpenNLP UIMA
> components.
>
> If you want an example (in Groovy, but should be trivial to transfer to
> Java)
> on how to use the OpenNLP UIMA components with uimaFIT, see here:
>
>   https://cwiki.apache.org/confluence/display/UIMA/uimaFIT+and+Groovy
>
> Cheers,
>
> -- Richard
>
> > On 19.04.2016, at 16:07, Rui Lopes <rl...@ipb.pt.INVALID> wrote:
> >
> > Hi all,
> >
> > I’m trying to use OpenNLP uima to build a very simple pipeline:
> >
> > CollectionReaderDescription reader = CollectionReaderFactory
> >
>  .createReaderDescription(AbstractCollectionReader.class,
> AbstractCollectionReader.PARAM_VALUE, 33);
> >
> > AnalysisEngineDescription tokenizer =
> AnalysisEngineFactory.createEngineDescription(SimpleTokenizer.class,
> >                               "opennlp.uima.SentenceType",
> "pt.ipb.pos.type.Sentence", "opennlp.uima.TokenType",
> >                               "pt.ipb.pos.type.Token");
> >
> >
> > AnalysisEngineDescription ae =
> AnalysisEngineFactory.createEngineDescription(GetStartedQuickAE.class);
> >
> > SimplePipeline.runPipeline(reader, tokenizer, ae);
> >
> >
> > ------
> > The GetStartedQuickAE just prints the Annotations:
> >
> >       @Override
> >       public void process(JCas jCas) throws
> AnalysisEngineProcessException {
> >               System.out.println(jCas.getDocumentText());
> >
> >               for(Annotation a : jCas.getAnnotationIndex()) {
> >                       System.out.println(a);
> >               }
> >
> >               System.out.println("Done");
> >
> >
> >       }
> >
> >
> > ———
> > The output is:
> >
> >
> > Apr 19, 2016 3:04:46 PM opennlp.uima.tokenize.AbstractTokenizer
> initialize(71)
> > INFO: Initializing the OpenNLP Simple Tokenizer annotator.
> > Apr 19, 2016 3:04:46 PM opennlp.uima.util.AnnotatorUtil
> getOptionalParameter(440)
> > INFO: opennlp.uima.IsRemoveExistingAnnotations = not set
> > Apr 19, 2016 3:04:46 PM opennlp.uima.util.AnnotatorUtil
> getOptionalParameter(440)
> > INFO: opennlp.uima.SentenceType = pt.ipb.pos.type.Sentence
> > Apr 19, 2016 3:04:46 PM opennlp.uima.util.AnnotatorUtil
> getOptionalParameter(440)
> > INFO: opennlp.uima.TokenType = pt.ipb.pos.type.Token
> > This article aims to observe the didactic action and its epistemological
> insertion in education trends as well as its role as a medium capable of
> causing changes in this alignment. Main objective is the need to
> consciously integrate between epistemology and education trends didactic
> application. The methodological procedure trend the application relied on
> observations from years in which the subjects were given Cytology and
> Histology in undergraduate courses. The results of observations point to a
> single procedure, with little clarity regarding the alignment epistemology,
> educational trends, teaching action. Associate art practice can provide a
> biological alternative capable of generating a position and "profitable
> shifts" in epistemological and pedagogical articulating. Different
> strategies need to be created to establish conditions that allow the
> configuration of knowledge as a whole, while respecting cultural diversity
> in which knowledge is configured.
> > DocumentAnnotation
> >   sofa: _InitialView
> >   begin: 0
> >   end: 969
> >   language: "x-unspecified"
> >
> > Done
> >
> >
> > There is only one Annotation? Does anyone knows why?
> >
> > Thanks for any feedback!
> >
> > All the best,
> >
> > Rui Lopes
> >
>
>

Re: UIMA FIT Pipeline with OpenNLP tokeniser

Posted by Richard Eckart de Castilho <re...@apache.org>.
Short answer: no :)

Longer answer: You don't seem to be using the actual OpenNLP UIMA components.

If you want an example (in Groovy, but should be trivial to transfer to Java)
on how to use the OpenNLP UIMA components with uimaFIT, see here:

  https://cwiki.apache.org/confluence/display/UIMA/uimaFIT+and+Groovy

Cheers,

-- Richard

> On 19.04.2016, at 16:07, Rui Lopes <rl...@ipb.pt.INVALID> wrote:
> 
> Hi all,
> 
> I’m trying to use OpenNLP uima to build a very simple pipeline:
> 
> CollectionReaderDescription reader = CollectionReaderFactory
> 				.createReaderDescription(AbstractCollectionReader.class, AbstractCollectionReader.PARAM_VALUE, 33);
> 
> AnalysisEngineDescription tokenizer = AnalysisEngineFactory.createEngineDescription(SimpleTokenizer.class,
> 				"opennlp.uima.SentenceType", "pt.ipb.pos.type.Sentence", "opennlp.uima.TokenType",
> 				"pt.ipb.pos.type.Token");
> 
> 
> AnalysisEngineDescription ae = AnalysisEngineFactory.createEngineDescription(GetStartedQuickAE.class);
> 
> SimplePipeline.runPipeline(reader, tokenizer, ae);
> 
> 
> ------
> The GetStartedQuickAE just prints the Annotations:
> 
> 	@Override
> 	public void process(JCas jCas) throws AnalysisEngineProcessException {
> 		System.out.println(jCas.getDocumentText());
> 		
> 		for(Annotation a : jCas.getAnnotationIndex()) {
> 			System.out.println(a);
> 		}
> 		
> 		System.out.println("Done");
> 		
> 
> 	}
> 
> 
> ———
> The output is:
> 
> 
> Apr 19, 2016 3:04:46 PM opennlp.uima.tokenize.AbstractTokenizer initialize(71)
> INFO: Initializing the OpenNLP Simple Tokenizer annotator.
> Apr 19, 2016 3:04:46 PM opennlp.uima.util.AnnotatorUtil getOptionalParameter(440)
> INFO: opennlp.uima.IsRemoveExistingAnnotations = not set
> Apr 19, 2016 3:04:46 PM opennlp.uima.util.AnnotatorUtil getOptionalParameter(440)
> INFO: opennlp.uima.SentenceType = pt.ipb.pos.type.Sentence
> Apr 19, 2016 3:04:46 PM opennlp.uima.util.AnnotatorUtil getOptionalParameter(440)
> INFO: opennlp.uima.TokenType = pt.ipb.pos.type.Token
> This article aims to observe the didactic action and its epistemological insertion in education trends as well as its role as a medium capable of causing changes in this alignment. Main objective is the need to consciously integrate between epistemology and education trends didactic application. The methodological procedure trend the application relied on observations from years in which the subjects were given Cytology and Histology in undergraduate courses. The results of observations point to a single procedure, with little clarity regarding the alignment epistemology, educational trends, teaching action. Associate art practice can provide a biological alternative capable of generating a position and "profitable shifts" in epistemological and pedagogical articulating. Different strategies need to be created to establish conditions that allow the configuration of knowledge as a whole, while respecting cultural diversity in which knowledge is configured.
> DocumentAnnotation
>   sofa: _InitialView
>   begin: 0
>   end: 969
>   language: "x-unspecified"
> 
> Done
> 
> 
> There is only one Annotation? Does anyone knows why?
> 
> Thanks for any feedback!
> 
> All the best,
> 
> Rui Lopes
>