You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by "Abramowitsch, Peter" <pa...@hearst.com> on 2016/06/21 15:20:02 UTC
Problems initializing new Annotator

Help --  I need  suggestions for where else to look for a problem adding a
new annotation engine to a pipeline.  I've Run out of ideas.

New Annotator is a UimaTokensRegex which builds an annotation called
MultipleWordTermOccurrence.
Here is where I've gotten to:

* Created Cas description of Engine import a TypeSystem file defining
MultipleWordTermOccurrence
* Generated the classes
* Analysis Engine can be created with this Annotator
* Type System is being parsed.  I can verify this because, when I change
the name of the file, it complains
* I can add engine to the fastUMLSPipeline or post-run it's process(jCas)
on its own 
* The functionality of the Annotator is definitely brought into play

SYMPTOM:  
When the annotation is about to be written and I do this:

	jCas.getCasType(MultiWordTermOccurrence.type);

I get this exception:

	"JCas type "com.hbm.MultiWordTermOccurrence" used in Java code,  but was
not declared in the XML type descriptor."


EVIDENCE

* Class MultiWordTermOccurrence has definitely been referenced during
initialization, and I can step through the casRegister() code.
* I get back a typeIndex of 253
* But at annotation time, when I look at JCas' typeArray[253]  it is null.
 
* There seem to be two class loaders for my jCas and I can't figure out why

PROBLEM  I have noticed that the constructor of
MultiWordTermOccurrence_Type is never being called.   I can see that the
CasImpl code at getType(int i)  provides for lazily doing that if it
wasn't done before, but the code seems to think the _Type object doesn't
need building.


I've tried different ways of initializing the pipeline, I've tried
different versions of the uima code (ctakes uses 2.4.0 and I tried uima
2.8.0)
I've tried building the typesystem into the engine file
Tried changing names of XML resources to make sure the correct ones are
being consumed.

I know this is more of a UIMA problem than a CTAKES problem, but was
wondering if any special modifications have been made to the
initialization code in CTAKES' bundled UIMA that require a special or
different construction of the engine

If I comment out the call to process of the extra annotator engine,
everything works fine.


Any suggestions of where else to look?



----------  SNIPPETS of relevant code -------
XMLInputSource in = new
XMLInputSource("desc/uima-regex-annotator/TkRegexDescriptorAE.xml");
        ResourceSpecifier specifier =
UIMAFramework.getXMLParser().parseResourceSpecifier(in);
        return UIMAFramework.produceAnalysisEngine(specifier);
-------------------------------

_tkregexae = buildTokensRegex();
		_aed = getFastPipeline();
		_aae = createEngine(_aed);
-------------------------------
_jcas.setDocumentText(req.body());
		_aae.process(_jcas);
		_tkregexae.process(_jcas);
--------------------------------


Type type = jCas.getCasType(MultiWordTermOccurrence.typeIndexID);  <----
where it blows up
		AnnotationFS annotation = jCas.getCas().createAnnotation(type ,
occ.getBegin(),occ.getEnd());
		MultiWordTermOccurrence a = (MultiWordTermOccurrence) annotation;
		StringArray patternFeature = new StringArray(jCas, occ.size());