You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by Yi-Wen Liu <yi...@usc.edu> on 2015/11/07 06:59:15 UTC

CR descriptor

Hi,

I am looking for the main collection reader(CR) in cTAKES in order to do
scale out on UIMA DUCC. And in des/ctakes-core/des/collection_reader/,
there are multiple CR xml files. I am not sure which is the one that should
be specified in DUCC's job file...are they all necessary in cTAKES job or
some of them are offered for other reference?

I am not familiar with cTAKES structure so hope somebody can help me out,
thanks!

Thanks,
Yi-Wen

Re: CR descriptor

Posted by Yi-Wen Liu <yi...@usc.edu>.
Hi,

Thanks! That helps a lot!

My data format is text file, can I assume other descriptors work like that?
Because there are also multiple AE xmls in
desc\ctakes-core\desc\analysis_engine, and CC xmls
in desc\ctakes-core\desc\cas_consumer, and I only need one of each.
It seems like they are AggregateAE and FilesInDirectoryCasConsumer.example
in my case?

Thanks,
Yi-Wen

On Sat, Nov 7, 2015 at 3:46 AM, Miller, Timothy <
Timothy.Miller@childrens.harvard.edu> wrote:

> Hi Yi-Wen,
> There are different collection readers for different data sources, and we
> usually try to give them descriptive names.
> FilesInDirectoryCollectionReader is one of the most useful ones -- it will
> look for a list of text files in a directory and put one file in each cas.
> If your data is in that format or is easy to convert to that format that's
> probably a good starting point.
> Tim
>
> ________________________________________
> From: Yi-Wen Liu <yi...@usc.edu>
> Sent: Saturday, November 7, 2015 12:59 AM
> To: dev@ctakes.apache.org
> Subject: CR descriptor
>
> Hi,
>
> I am looking for the main collection reader(CR) in cTAKES in order to do
> scale out on UIMA DUCC. And in des/ctakes-core/des/collection_reader/,
> there are multiple CR xml files. I am not sure which is the one that should
> be specified in DUCC's job file...are they all necessary in cTAKES job or
> some of them are offered for other reference?
>
> I am not familiar with cTAKES structure so hope somebody can help me out,
> thanks!
>
> Thanks,
> Yi-Wen
>

Re: CR descriptor

Posted by "Miller, Timothy" <Ti...@childrens.harvard.edu>.
Hi Yi-Wen,
There are different collection readers for different data sources, and we usually try to give them descriptive names. FilesInDirectoryCollectionReader is one of the most useful ones -- it will look for a list of text files in a directory and put one file in each cas. If your data is in that format or is easy to convert to that format that's probably a good starting point.
Tim

________________________________________
From: Yi-Wen Liu <yi...@usc.edu>
Sent: Saturday, November 7, 2015 12:59 AM
To: dev@ctakes.apache.org
Subject: CR descriptor

Hi,

I am looking for the main collection reader(CR) in cTAKES in order to do
scale out on UIMA DUCC. And in des/ctakes-core/des/collection_reader/,
there are multiple CR xml files. I am not sure which is the one that should
be specified in DUCC's job file...are they all necessary in cTAKES job or
some of them are offered for other reference?

I am not familiar with cTAKES structure so hope somebody can help me out,
thanks!

Thanks,
Yi-Wen