You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ctakes.apache.org by Justin Zhang <ju...@gmail.com> on 2015/08/06 00:38:36 UTC

how to run i2b2 data

Hello everyone,

I am running ctakes with i2b2 data
https://www.i2b2.org/NLP/DataSets/Main.php

In each xml file, there are multiple patient records. I am able to separate
each patient into single files and process them with "runCPE.sh"

Is there a way to convert this single xml file into the format "ctakes"
accepted, and process as a single input file, and generate a single output
file (results labelled by patient id). For example, each patient id has a
"smoking status".

Thanks,

-- 
Justin

RE: how to run i2b2 data

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.

Hi Justin,

If you check out the source code, you should be able to find that class in the ctakes-core component.

Sean

-----Original Message-----
From: Justin Zhang [mailto:justinzhang.xl@gmail.com] 
Sent: Friday, August 07, 2015 10:45 AM
To: dev@ctakes.apache.org
Subject: Re: how to run i2b2 data

Thanks Sean for your understanding, and I am in hope now.

Where is the best place to start looking at regarding "create a collection reader that works similarly to org.apache.ctakes.core.cr.
FilesInDirectoryCollectionReader"?

Justin

On Wed, Aug 5, 2015 at 7:24 PM, Finan, Sean < Sean.Finan@childrens.harvard.edu> wrote:

> Hi Justin,
>
> A shot in the dark:
> You could create a collection reader that works similarly to 
> org.apache.ctakes.core.cr.FilesInDirectoryCollectionReader , but 
> instead of grabbing all of the files in a directory it grabs all the 
> records parsed from a single .xml and runs a pipeline per record.  
> Basically, swap a directory for an .xml, a text file for an xml element containing a record.
> Somebody out there might have something that already does as much.
>
> Sean
>
> -----Original Message-----
> From: Justin Zhang [mailto:justinzhang.xl@gmail.com]
> Sent: Wednesday, August 05, 2015 6:40 PM
> To: user@ctakes.apache.org; dev@ctakes.apache.org
> Subject: how to run i2b2 data
>
> Hello everyone,
>
> I am running ctakes with i2b2 data
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.i2b2.org_NLP_
> DataSets_Main.php&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxe
> FU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=IygWj6YGkcjofGRbrDi
> FJacJHMaBveHR9qzY0VD1AAE&s=swpt3QP4-B392iLlJ9wypBwD17tRDOCxPdSZOW1rS8s
> &e=
>
> In each xml file, there are multiple patient records. I am able to 
> separate each patient into single files and process them with "runCPE.sh"
>
> Is there a way to convert this single xml file into the format "ctakes"
> accepted, and process as a single input file, and generate a single 
> output file (results labelled by patient id). For example, each 
> patient id has a "smoking status".
>
> Thanks,
>
> --
> Justin
>



--
Justin

Re: how to run i2b2 data

Posted by Justin Zhang <ju...@gmail.com>.

Thanks Sean for your understanding, and I am in hope now.

Where is the best place to start looking at regarding "create a collection
reader that works similarly to org.apache.ctakes.core.cr.
FilesInDirectoryCollectionReader"?

Justin

On Wed, Aug 5, 2015 at 7:24 PM, Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Justin,
>
> A shot in the dark:
> You could create a collection reader that works similarly to
> org.apache.ctakes.core.cr.FilesInDirectoryCollectionReader , but instead of
> grabbing all of the files in a directory it grabs all the records parsed
> from a single .xml and runs a pipeline per record.  Basically, swap a
> directory for an .xml, a text file for an xml element containing a record.
> Somebody out there might have something that already does as much.
>
> Sean
>
> -----Original Message-----
> From: Justin Zhang [mailto:justinzhang.xl@gmail.com]
> Sent: Wednesday, August 05, 2015 6:40 PM
> To: user@ctakes.apache.org; dev@ctakes.apache.org
> Subject: how to run i2b2 data
>
> Hello everyone,
>
> I am running ctakes with i2b2 data
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.i2b2.org_NLP_DataSets_Main.php&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=IygWj6YGkcjofGRbrDiFJacJHMaBveHR9qzY0VD1AAE&s=swpt3QP4-B392iLlJ9wypBwD17tRDOCxPdSZOW1rS8s&e=
>
> In each xml file, there are multiple patient records. I am able to
> separate each patient into single files and process them with "runCPE.sh"
>
> Is there a way to convert this single xml file into the format "ctakes"
> accepted, and process as a single input file, and generate a single output
> file (results labelled by patient id). For example, each patient id has a
> "smoking status".
>
> Thanks,
>
> --
> Justin
>



-- 
Justin

RE: how to run i2b2 data

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.

Hi Justin,

A shot in the dark:
You could create a collection reader that works similarly to org.apache.ctakes.core.cr.FilesInDirectoryCollectionReader , but instead of grabbing all of the files in a directory it grabs all the records parsed from a single .xml and runs a pipeline per record.  Basically, swap a directory for an .xml, a text file for an xml element containing a record.
Somebody out there might have something that already does as much.

Sean

-----Original Message-----
From: Justin Zhang [mailto:justinzhang.xl@gmail.com] 
Sent: Wednesday, August 05, 2015 6:40 PM
To: user@ctakes.apache.org; dev@ctakes.apache.org
Subject: how to run i2b2 data

Hello everyone,

I am running ctakes with i2b2 data
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.i2b2.org_NLP_DataSets_Main.php&d=BQIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=IygWj6YGkcjofGRbrDiFJacJHMaBveHR9qzY0VD1AAE&s=swpt3QP4-B392iLlJ9wypBwD17tRDOCxPdSZOW1rS8s&e= 

In each xml file, there are multiple patient records. I am able to separate each patient into single files and process them with "runCPE.sh"

Is there a way to convert this single xml file into the format "ctakes"
accepted, and process as a single input file, and generate a single output file (results labelled by patient id). For example, each patient id has a "smoking status".

Thanks,

--
Justin