You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Marshall Schor <ms...@schor.com> on 2017/02/24 19:25:33 UTC

UIMA XML parser, schema validation

The UIMA XML parser has an API to set/reset running with schema validation.

While running V3 testing, I discovered (due to different ordering of tests) that
sometimes the testAdditionalAEs test was being run with schema validation "on"
(likely due to some previous setting of this not resetting afterwards).

With it on, one of the test XML descriptors fails validation:
TextAnalysisEngineImplTest/MultipleAeTest4.xml

The part that fails is the custom index definition.  Each custom index
definition has a "Label", and the UIMA reference manual says this has to be a
string. The schema validator for this value says it has to be a "Name", which is
defined to be https://www.w3.org/TR/2000/WD-xml-2e-20000814#NT-Name

The value not passing is from this part of the descriptor:

<fsIndexDescription>
          <label>Annotation Bag Index</label>
          <typeName>uima.tcas.Annotation</typeName>

The <label> value fails, because it has "blanks" in it.

I'm thinking that this schema spec should be relaxed to "string".
Pros: existing code that previously ran, continues to run if schema validation
is enabled.
Cons: not sure there are any.  I don't think this label is used in any way other
than as an "ID".

So, I'm planning to modify the schema in this manner.  Any other opinions?

-Marshall