You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Roberto Franchini <ro...@gmail.com> on 2007/08/29 16:50:05 UTC

How to test a CollectionReader

Hi,
I'm trying to test a CollectionReader, but it seems that the
TypeSystem isn't initialized since I get this:

org.apache.uima.cas.CASRuntimeException: JCas type
"it.celi.types.SourceDocumentInformation" used in Java code,  but was
not declared in the XML type descriptor.
	at org.apache.uima.jcas.impl.JCasImpl.getType(JCasImpl.java:397)
	at org.apache.uima.jcas.cas.TOP.<init>(TOP.java:92)
	at org.apache.uima.jcas.cas.AnnotationBase.<init>(AnnotationBase.java:53)
	at org.apache.uima.jcas.tcas.Annotation.<init>(Annotation.java:54)
	at it.celi.types.Annotation.<init>(Annotation.java:41)
	at it.celi.types.SourceDocumentInformation.<init>(SourceDocumentInformation.java:41)
	at it.celi.components.collection.RecursiveFileSytemCollectionReader.getCurrent(RecursiveFileSytemCollectionReader.java:192)
	at it.celi.components.collection.RecursiveFileSytemCollectionReader.getNext(RecursiveFileSytemCollectionReader.java:158)
	at it.celi.components.retriever.TestRecursiveFileSytemCollectionReader.testGetNext(TestRecursiveFileSytemCollectionReader.java:70)

..........


Thes test is something like that:
....
	private RecursiveFileSytemCollectionReader rfcr;

	@Before
	public void setUp() throws Exception {
		
		File file;

		file = FileUtil.getFileAsResource("RecursiveFileSystemCollectionReader.xml");
		
		ResourceSpecifier aSpecifier =
UIMAFramework.getXMLParser().parseCollectionReaderDescription(new
XMLInputSource(file));

		rfcr = (RecursiveFileSytemCollectionReader)
UIMAFramework.produceCollectionReader(aSpecifier);

	}

	@Test
	public void getNext() {
		try {

			CasManager casManager = rfcr.getCasManager();
			casManager.defineCasPool("pool", 2, null);
			

			while (rfcr.hasNext()) {

				CAS cas = casManager.getCas("pool");
		
				rfcr.getNext(cas);

				Type fileLocType =
cas.getTypeSystem().getType(SourceDocumentInformation.class.getName());
				Feature fileNameFeat = fileLocType.getFeatureByBaseName("uri");
				FSIterator it = cas.getAnnotationIndex(fileLocType).iterator();
				FeatureStructure fileLoc = it.get();
.......... a lot of catch

In the collection reader's xml descriptor  are declared the typesystem
and the output capabilities. And, if deployed in a CPE this CR works!

Thanks in advance,
Roberto

-- 
Roberto Franchini
CELI s.r.l.  (http://www.celi.it) - C.so Moncalieri 21 - 10131 Torino - ITALY
Tel +39-011-6600814 - Fax +39-011-6600687
jabber:ro.franchini@gmail.com skype:ro.franchini

Re: How to test a CollectionReader (in Groovy)

Posted by Philip Ogren <ph...@ogren.info>.
I didn't follow the thread closely so I may be wandering here - but I 
thought I would volunteer my working strategy for testing collection 
readers in Groovy even though it may be overly simplistic for many 
situations. 

My unit tests for our collection readers start off with one line:

JCas jCas = TestsUtil.processCR("desc/test/myCRdesc.xml", 0)          


followed immediately by assertions of what I expect to be in the JCas.

The method TestsUtil.processCR looks like this:

static JCas processCR(String descriptorFileName, int documentNumber)
    {
        XMLInputSource xmlInput = new XMLInputSource(new File("desc/annotators/EmptyAnnotator.xml"))
        ResourceSpecifier specifier = UIMAFramework.getXMLParser().parseResourceSpecifier(xmlInput)
        AnalysisEngine analysisEngine = UIMAFramework.produceAnalysisEngine(specifier)
        JCas jCas = analysisEngine.newJCas()
        xmlInput = new XMLInputSource(new File(descriptorFileName))
        specifier = UIMAFramework.getXMLParser().parseResourceSpecifier(xmlInput)
        CollectionReader collectionReader = UIMAFramework.produceCollectionReader(specifier)       

        for(i in 0..documentNumber)
        {
            jCas.reset()
            collectionReader.getNext(jCas.getCas())
        }
        return jCas
    }



Where EmptyAnnotator.xml is a descriptor file for an analysis engine 
that does nothing as follows:

public class EmptyAnnotator extends JCasAnnotator_ImplBase{
    public void process(JCas jCas) throws AnalysisEngineProcessException    {
        //this annotator does nothing!
    }
}


I hope this is helpful.


Re: How to test a CollectionReader

Posted by Marshall Schor <ms...@schor.com>.
Roberto Franchini wrote:
> On 8/30/07, Marshall Schor <ms...@schor.com> wrote:
>   
>> Hi Roberto -
>>
>> After some tracing and tracking down, it seems the method you were using
>> to create your CAS pool
>> needs an additional statement.  Here's the original, which unfortunately
>> doesn't set up the type system
>> from the Collection Reader into the CASes (normally this isn't a
>> problem, because this happens elsewhere
>> in the setup to use a collection reader within a CPE):
>>     
> [cut]
>
> Thanks a lot, now it works!
> I was on the right path, but I didn't find a way to set the type system.
> Maybe it could be usefull for others to include the test in further
> uima releases. I know, testing a FS collection reader seems silly, but
> we are going to implement collection readers for other sources, such
> as net and databases.I think good tests will help us write better
> code.
>   
I completely agree with you on the value of good tests - thanks for the
motivation!

-Marshall
> Regards,
> Roberto
>
>   


Re: How to test a CollectionReader

Posted by Roberto Franchini <ro...@gmail.com>.
On 8/30/07, Marshall Schor <ms...@schor.com> wrote:
> Hi Roberto -
>
> After some tracing and tracking down, it seems the method you were using
> to create your CAS pool
> needs an additional statement.  Here's the original, which unfortunately
> doesn't set up the type system
> from the Collection Reader into the CASes (normally this isn't a
> problem, because this happens elsewhere
> in the setup to use a collection reader within a CPE):
[cut]

Thanks a lot, now it works!
I was on the right path, but I didn't find a way to set the type system.
Maybe it could be usefull for others to include the test in further
uima releases. I know, testing a FS collection reader seems silly, but
we are going to implement collection readers for other sources, such
as net and databases.I think good tests will help us write better
code.
Regards,
Roberto

-- 
Roberto Franchini
CELI s.r.l.  (http://www.celi.it) - C.so Moncalieri 21 - 10131 Torino - ITALY
Tel +39-011-6600814 - Fax +39-011-6600687
jabber:ro.franchini@gmail.com skype:ro.franchini

Re: How to test a CollectionReader

Posted by Marshall Schor <ms...@schor.com>.
Hi Roberto -

After some tracing and tracking down, it seems the method you were using
to create your CAS pool
needs an additional statement.  Here's the original, which unfortunately
doesn't set up the type system
from the Collection Reader into the CASes (normally this isn't a
problem, because this happens elsewhere
in the setup to use a collection reader within a CPE):

            CasManager casManager = fcr.getCasManager();
            casManager.defineCasPool("pool", 2, null);

and here's the one which will set up the pool with the right type system:

            CasManager casManager = fcr.getCasManager();
               //  Added line:
           
casManager.addMetaData((ProcessingResourceMetaData)fcr.getMetaData());
            casManager.defineCasPool("pool", 2, null);

This lets the casManager know what the type system is, when setting up
the pool.

There's one other fix needed, also: 

At the end of the iterator is a statement to release the cas back to the
pool.  The form you are using
currently only works when the CAS View is the the base view (we'll fix
this...).  There is another form
which works which is recommended.  Here again, is the original, and the
new way:

       casManager.releaseCas(cas);   // fails because "cas" isn't the
base cas view
 
       cas.release();   // works
 
With those 2 fixes, your test should work.

-Marshall              

Roberto Franchini wrote:
> On 8/30/07, Marshall Schor <ms...@schor.com> wrote:
>   
>> Hi Roberto -
>>
>> This might be caused by the Collection Reader Descriptor's type system
>> missing the particular type
>> "it.celi.types.SourceDocumentInformation";  can you verify it is defined
>> (and spelled correctly, etc.) as part of this particular Collection
>> Reader Descriptor?
>>
>> If this type was defined in another descriptor that is deployed with
>> this collection reader, it would work when the whole CPE was run,
>> because all the types from the various descriptors that make up the CPE
>> pipeline are merged.
>>
>>     
>
> I've tried to test the FileSystemCollectionReader provided in
> uimaj-tools and I got the same error.
> Attached (I hope it's possible to send attachements) the test I wrote.
> Regards,
> Roberto
>
>   


Re: How to test a CollectionReader

Posted by Tong Fin <to...@gmail.com>.
The type "SourceDocumentInformation" is imported by
FileSystemCollectionReader as follows:

    <import name="org.apache.uima.examples.SourceDocumentInformation"/>

Please make sure that the "imported" XML file is in the classpath.

- Tong

On 8/30/07, Marshall Schor <ms...@schor.com> wrote:
>
> Thanks, Roberto -
>
> I'm able to reproduce this and am investigating....
>
> -Marshall
>
> Roberto Franchini wrote:
> > On 8/30/07, Marshall Schor <ms...@schor.com> wrote:
> >
> >> Hi Roberto -
> >>
> >> This might be caused by the Collection Reader Descriptor's type system
> >> missing the particular type
> >> "it.celi.types.SourceDocumentInformation";  can you verify it is
> defined
> >> (and spelled correctly, etc.) as part of this particular Collection
> >> Reader Descriptor?
> >>
> >> If this type was defined in another descriptor that is deployed with
> >> this collection reader, it would work when the whole CPE was run,
> >> because all the types from the various descriptors that make up the CPE
> >> pipeline are merged.
> >>
> >>
> >
> > I've tried to test the FileSystemCollectionReader provided in
> > uimaj-tools and I got the same error.
> > Attached (I hope it's possible to send attachements) the test I wrote.
> > Regards,
> > Roberto
> >
>

Re: How to test a CollectionReader

Posted by Marshall Schor <ms...@schor.com>.
Thanks, Roberto -

I'm able to reproduce this and am investigating....

-Marshall

Roberto Franchini wrote:
> On 8/30/07, Marshall Schor <ms...@schor.com> wrote:
>   
>> Hi Roberto -
>>
>> This might be caused by the Collection Reader Descriptor's type system
>> missing the particular type
>> "it.celi.types.SourceDocumentInformation";  can you verify it is defined
>> (and spelled correctly, etc.) as part of this particular Collection
>> Reader Descriptor?
>>
>> If this type was defined in another descriptor that is deployed with
>> this collection reader, it would work when the whole CPE was run,
>> because all the types from the various descriptors that make up the CPE
>> pipeline are merged.
>>
>>     
>
> I've tried to test the FileSystemCollectionReader provided in
> uimaj-tools and I got the same error.
> Attached (I hope it's possible to send attachements) the test I wrote.
> Regards,
> Roberto
>   

Re: How to test a CollectionReader

Posted by Roberto Franchini <ro...@gmail.com>.
On 8/30/07, Marshall Schor <ms...@schor.com> wrote:
> Hi Roberto -
>
> This might be caused by the Collection Reader Descriptor's type system
> missing the particular type
> "it.celi.types.SourceDocumentInformation";  can you verify it is defined
> (and spelled correctly, etc.) as part of this particular Collection
> Reader Descriptor?
>
> If this type was defined in another descriptor that is deployed with
> this collection reader, it would work when the whole CPE was run,
> because all the types from the various descriptors that make up the CPE
> pipeline are merged.
>

I've tried to test the FileSystemCollectionReader provided in
uimaj-tools and I got the same error.
Attached (I hope it's possible to send attachements) the test I wrote.
Regards,
Roberto

-- 
Roberto Franchini
CELI s.r.l.  (http://www.celi.it) - C.so Moncalieri 21 - 10131 Torino - ITALY
Tel +39-011-6600814 - Fax +39-011-6600687
jabber:ro.franchini@gmail.com skype:ro.franchini

Re: How to test a CollectionReader

Posted by Marshall Schor <ms...@schor.com>.
Hi Roberto -

This might be caused by the Collection Reader Descriptor's type system
missing the particular type
"it.celi.types.SourceDocumentInformation";  can you verify it is defined
(and spelled correctly, etc.) as part of this particular Collection
Reader Descriptor?

If this type was defined in another descriptor that is deployed with
this collection reader, it would work when the whole CPE was run,
because all the types from the various descriptors that make up the CPE
pipeline are merged.

-Marshall

Roberto Franchini wrote:
> Hi,
> I'm trying to test a CollectionReader, but it seems that the
> TypeSystem isn't initialized since I get this:
>
> org.apache.uima.cas.CASRuntimeException: JCas type
> "it.celi.types.SourceDocumentInformation" used in Java code,  but was
> not declared in the XML type descriptor.
> 	at org.apache.uima.jcas.impl.JCasImpl.getType(JCasImpl.java:397)
> 	at org.apache.uima.jcas.cas.TOP.<init>(TOP.java:92)
> 	at org.apache.uima.jcas.cas.AnnotationBase.<init>(AnnotationBase.java:53)
> 	at org.apache.uima.jcas.tcas.Annotation.<init>(Annotation.java:54)
> 	at it.celi.types.Annotation.<init>(Annotation.java:41)
> 	at it.celi.types.SourceDocumentInformation.<init>(SourceDocumentInformation.java:41)
> 	at it.celi.components.collection.RecursiveFileSytemCollectionReader.getCurrent(RecursiveFileSytemCollectionReader.java:192)
> 	at it.celi.components.collection.RecursiveFileSytemCollectionReader.getNext(RecursiveFileSytemCollectionReader.java:158)
> 	at it.celi.components.retriever.TestRecursiveFileSytemCollectionReader.testGetNext(TestRecursiveFileSytemCollectionReader.java:70)
>
> ..........
>
>
> Thes test is something like that:
> ....
> 	private RecursiveFileSytemCollectionReader rfcr;
>
> 	@Before
> 	public void setUp() throws Exception {
> 		
> 		File file;
>
> 		file = FileUtil.getFileAsResource("RecursiveFileSystemCollectionReader.xml");
> 		
> 		ResourceSpecifier aSpecifier =
> UIMAFramework.getXMLParser().parseCollectionReaderDescription(new
> XMLInputSource(file));
>
> 		rfcr = (RecursiveFileSytemCollectionReader)
> UIMAFramework.produceCollectionReader(aSpecifier);
>
> 	}
>
> 	@Test
> 	public void getNext() {
> 		try {
>
> 			CasManager casManager = rfcr.getCasManager();
> 			casManager.defineCasPool("pool", 2, null);
> 			
>
> 			while (rfcr.hasNext()) {
>
> 				CAS cas = casManager.getCas("pool");
> 		
> 				rfcr.getNext(cas);
>
> 				Type fileLocType =
> cas.getTypeSystem().getType(SourceDocumentInformation.class.getName());
> 				Feature fileNameFeat = fileLocType.getFeatureByBaseName("uri");
> 				FSIterator it = cas.getAnnotationIndex(fileLocType).iterator();
> 				FeatureStructure fileLoc = it.get();
> .......... a lot of catch
>
> In the collection reader's xml descriptor  are declared the typesystem
> and the output capabilities. And, if deployed in a CPE this CR works!
>
> Thanks in advance,
> Roberto
>
>