You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Paul Browne <pa...@firstpartners.net> on 2016/11/26 22:26:25 UTC

Example of Tika FilesystemReader working with uimaFIT?

​Folks,

Wondering if there are any samples of using the Uima component Tika
FilesystemReader working with uimaFIT?

I've been playing around with it, getting several errors (probably my
fault) but can't appear to find a similar example on the website / mailing
list despite a  search. Have downloaded and compiled source (Uima, Uima
tools, examples); existing code is clear but when I try to combine them to
do the following outline I get errors.

Aim is to:
1)Read a collection of documents using the Uima component Tika
FilesystemReader
2)later - do more serious POS tagging.

The code for is:

    CollectionReader readerEngine =
CollectionReaderFactory.createCollectionReader(FileSystemCollectionReader.class,
                FileSystemCollectionReader.PARAM_INPUTDIR,
                "C:\\Somelocation",
                FileSystemCollectionReader.PARAM_ENCODING, "UTF-8",
                FileSystemCollectionReader.PARAM_LANGUAGE, "EN");

AggregateBuilder builder = new AggregateBuilder();

SimplePipeline.runPipeline(readerEngine, builder.createAggregate());

 And the error is
Exception in thread "main" org.apache.uima.cas.CASRuntimeException: JCas
type "org.apache.uima.examples.SourceDocumentInformation" used in Java
code,  but was not declared in the XML type descriptor.

Similar error referenced at link below, but not clear how to implement the
suggested fix
http://user.uima.apache.narkive.com/b940cOrO/how-to-test-a-collectionreader

Any suggestions or pointers on the web that I should be looking at?

Thanks for your help

Paul

Re: Example of Tika FilesystemReader working with uimaFIT?

Posted by Richard Eckart de Castilho <re...@apache.org>.
Hi,

you can set up a "types.txt" file as documented here [1] to
point uimaFIT to the type system descriptor that contains the missing
annotation type.

Alternatively, you can construct a load your type system description
in code and pass it after the class argument to createCollectionReader,
e.g. 

  TypeSystemDescription tsd = TypeSystemDescriptionFactory.createTypeSystemDescriptionFromPath(
    "path/to/your/typesystem.xml");
  CollectionReader readerEngine = CollectionReaderFactory.createCollectionReader(
    FileSystemCollectionReader.class, tsd, ... params ...);

Cheers,

-- Richard

[1] https://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#ugr.tools.uimafit.typesystem

> On 26.11.2016, at 22:26, Paul Browne <pa...@firstpartners.net> wrote:
> 
> ​Folks,
> 
> Wondering if there are any samples of using the Uima component Tika
> FilesystemReader working with uimaFIT?
> 
> I've been playing around with it, getting several errors (probably my
> fault) but can't appear to find a similar example on the website / mailing
> list despite a  search. Have downloaded and compiled source (Uima, Uima
> tools, examples); existing code is clear but when I try to combine them to
> do the following outline I get errors.
> 
> Aim is to:
> 1)Read a collection of documents using the Uima component Tika
> FilesystemReader
> 2)later - do more serious POS tagging.
> 
> The code for is:
> 
>    CollectionReader readerEngine =
> CollectionReaderFactory.createCollectionReader(FileSystemCollectionReader.class,
>                FileSystemCollectionReader.PARAM_INPUTDIR,
>                "C:\\Somelocation",
>                FileSystemCollectionReader.PARAM_ENCODING, "UTF-8",
>                FileSystemCollectionReader.PARAM_LANGUAGE, "EN");
> 
> AggregateBuilder builder = new AggregateBuilder();
> 
> SimplePipeline.runPipeline(readerEngine, builder.createAggregate());
> 
> And the error is
> Exception in thread "main" org.apache.uima.cas.CASRuntimeException: JCas
> type "org.apache.uima.examples.SourceDocumentInformation" used in Java
> code,  but was not declared in the XML type descriptor.
> 
> Similar error referenced at link below, but not clear how to implement the
> suggested fix
> http://user.uima.apache.narkive.com/b940cOrO/how-to-test-a-collectionreader
> 
> Any suggestions or pointers on the web that I should be looking at?
> 
> Thanks for your help
> 
> Paul