You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by "Abramowitsch, Peter" <pa...@hearst.com> on 2016/06/21 15:20:02 UTC
Problems initializing new Annotator
Help -- I need suggestions for where else to look for a problem adding a
new annotation engine to a pipeline. I've Run out of ideas.
New Annotator is a UimaTokensRegex which builds an annotation called
MultipleWordTermOccurrence.
Here is where I've gotten to:
* Created Cas description of Engine import a TypeSystem file defining
MultipleWordTermOccurrence
* Generated the classes
* Analysis Engine can be created with this Annotator
* Type System is being parsed. I can verify this because, when I change
the name of the file, it complains
* I can add engine to the fastUMLSPipeline or post-run it's process(jCas)
on its own
* The functionality of the Annotator is definitely brought into play
SYMPTOM:
When the annotation is about to be written and I do this:
jCas.getCasType(MultiWordTermOccurrence.type);
I get this exception:
"JCas type "com.hbm.MultiWordTermOccurrence" used in Java code, but was
not declared in the XML type descriptor."
EVIDENCE
* Class MultiWordTermOccurrence has definitely been referenced during
initialization, and I can step through the casRegister() code.
* I get back a typeIndex of 253
* But at annotation time, when I look at JCas' typeArray[253] it is null.
* There seem to be two class loaders for my jCas and I can't figure out why
PROBLEM I have noticed that the constructor of
MultiWordTermOccurrence_Type is never being called. I can see that the
CasImpl code at getType(int i) provides for lazily doing that if it
wasn't done before, but the code seems to think the _Type object doesn't
need building.
I've tried different ways of initializing the pipeline, I've tried
different versions of the uima code (ctakes uses 2.4.0 and I tried uima
2.8.0)
I've tried building the typesystem into the engine file
Tried changing names of XML resources to make sure the correct ones are
being consumed.
I know this is more of a UIMA problem than a CTAKES problem, but was
wondering if any special modifications have been made to the
initialization code in CTAKES' bundled UIMA that require a special or
different construction of the engine
If I comment out the call to process of the extra annotator engine,
everything works fine.
Any suggestions of where else to look?
---------- SNIPPETS of relevant code -------
XMLInputSource in = new
XMLInputSource("desc/uima-regex-annotator/TkRegexDescriptorAE.xml");
ResourceSpecifier specifier =
UIMAFramework.getXMLParser().parseResourceSpecifier(in);
return UIMAFramework.produceAnalysisEngine(specifier);
-------------------------------
_tkregexae = buildTokensRegex();
_aed = getFastPipeline();
_aae = createEngine(_aed);
-------------------------------
_jcas.setDocumentText(req.body());
_aae.process(_jcas);
_tkregexae.process(_jcas);
--------------------------------
Type type = jCas.getCasType(MultiWordTermOccurrence.typeIndexID); <----
where it blows up
AnnotationFS annotation = jCas.getCas().createAnnotation(type ,
occ.getBegin(),occ.getEnd());
MultiWordTermOccurrence a = (MultiWordTermOccurrence) annotation;
StringArray patternFeature = new StringArray(jCas, occ.size());