You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by Pei Chen <ch...@apache.org> on 2015/07/24 17:12:55 UTC
Combining Knowledge- and Data-driven Methods for De-identification of
Clinical Narratives
Hi,
Re: http://www.sciencedirect.com/science/article/pii/S1532046415001392
This is very interesting work and I think it would be very valuable
for the general community. Is this something that you may be in
interested in contributing/sharing the code with the Apache cTAKES
community?
Thanks,
Pei
Re: Combining Knowledge- and Data-driven Methods for
De-identification of Clinical Narratives
Posted by John Green <jo...@gmail.com>.
Sounds great
On Thu, Jul 30, 2015 at 7:21 AM, Ted Strall <ts...@yahoo.com.invalid>
wrote:
> How / when can we go about getting started on this?
> From: "Chen, Pei" <Pe...@childrens.harvard.edu>
> To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>; Ted Strall <ts...@yahoo.com>
> Sent: Friday, July 24, 2015 12:52 PM
> Subject: RE: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives
>
> Ted- Welcome to the community!
> I think this would be a great enhancement.
> Jay- I think the BigTop folks did a lot with the smoke and integration tests... Do you how they did it? Something we can reuse?
> --Pei
> -----Original Message-----
> From: Ted Strall [mailto:tstrall@yahoo.com.INVALID]
> Sent: Friday, July 24, 2015 12:31 PM
> To: dev@ctakes.apache.org
> Subject: Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives
> I would be interested in helping to develop / maintain a regression testing framework for that.
> I'm new to ctakes (and just recently started stalking the dev mailing list) but I've been a software engineer for 20 years and have done a lot of framework automation stuff that will probably be required. As I write this, I am working on an automated integration test that will run on Jenkins that fires up and load an h2 database, a solr instance, an in-house indexing pipeline and an in-house search service, indexes 10k documents and executes and evaluates some canned queries before shutting itself down.
> I'm also working on a MS in Predictive Analytics and I am interested in applying machine learning and NLP to medical informatics, so I would welcome the chance to get dirty with that side of stuff, also.
> From: Jay Vyas <ja...@gmail.com>
> To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>
> Sent: Friday, July 24, 2015 10:44 AM
> Subject: Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives
>
> Yes this is very interesting work.
> - If we have access to a large corpus of de identified records we can recession test the ctakes platform.
> - I can help collaborate on a regression testing framework if someone else wants to help Maintain it.
>> On Jul 24, 2015, at 11:12 AM, Pei Chen <ch...@apache.org> wrote:
>>
>> Hi,
>> Re:
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.sciencedirect.
>> com_science_article_pii_S1532046415001392&d=BQIFaQ&c=qS4goWBT7poplM69z
>> y_3xhKwEW14JZMSdioCoppxeFU&r=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5
>> WY&m=IdFJ0ChLqz9-dg435_5Rea2_0EUPNDw0uCUKnNp_N7k&s=DOgavsLa7IIU0rgq8lx
>> DXTb33J8-4zgCWuKzL83CZyw&e= This is very interesting work and I think
>> it would be very valuable for the general community. Is this
>> something that you may be in interested in contributing/sharing the
>> code with the Apache cTAKES community?
>> Thanks,
>> Pei
>
RE: Combining Knowledge- and Data-driven Methods for
De-identification of Clinical Narratives
Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.
Hi Ted/Jay,
Thanks for suggesting and taking this up….
What information will be needed to accomplish what you were thinking?
Just thinking aloud here:
1) Test data. I think John Green crafted about 20-30 notes in the data folder. We can use this as a starting point.
2) Code to run though the various components and pipelines?
3) Environments to run thru different O/S/hardware, etc.?
4) Create a Gold Standard format (Knowtator and/or Anafora). cTAKES already has existing readers for those. [For ML based examples?]
I think there is an ctakes-regression project that we can probably just overwrite for new regression testing code.
From: Ted Strall [mailto:tstrall@yahoo.com]
Sent: Thursday, July 30, 2015 9:21 AM
To: Chen, Pei; dev@ctakes.apache.org
Subject: Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives
How / when can we go about getting started on this?
________________________________
From: "Chen, Pei" <Pe...@childrens.harvard.edu>>
To: "dev@ctakes.apache.org<ma...@ctakes.apache.org>" <de...@ctakes.apache.org>>; Ted Strall <ts...@yahoo.com>>
Sent: Friday, July 24, 2015 12:52 PM
Subject: RE: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives
Ted- Welcome to the community!
I think this would be a great enhancement.
Jay- I think the BigTop folks did a lot with the smoke and integration tests... Do you how they did it? Something we can reuse?
--Pei
-----Original Message-----
From: Ted Strall [mailto:tstrall@yahoo.com.INVALID<ma...@yahoo.com.INVALID>]
Sent: Friday, July 24, 2015 12:31 PM
To: dev@ctakes.apache.org<ma...@ctakes.apache.org>
Subject: Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives
I would be interested in helping to develop / maintain a regression testing framework for that.
I'm new to ctakes (and just recently started stalking the dev mailing list) but I've been a software engineer for 20 years and have done a lot of framework automation stuff that will probably be required. As I write this, I am working on an automated integration test that will run on Jenkins that fires up and load an h2 database, a solr instance, an in-house indexing pipeline and an in-house search service, indexes 10k documents and executes and evaluates some canned queries before shutting itself down.
I'm also working on a MS in Predictive Analytics and I am interested in applying machine learning and NLP to medical informatics, so I would welcome the chance to get dirty with that side of stuff, also.
From: Jay Vyas <ja...@gmail.com>>
To: "dev@ctakes.apache.org<ma...@ctakes.apache.org>" <de...@ctakes.apache.org>>
Sent: Friday, July 24, 2015 10:44 AM
Subject: Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives
Yes this is very interesting work.
- If we have access to a large corpus of de identified records we can recession test the ctakes platform.
- I can help collaborate on a regression testing framework if someone else wants to help Maintain it.
> On Jul 24, 2015, at 11:12 AM, Pei Chen <ch...@apache.org>> wrote:
>
> Hi,
> Re:
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.sciencedirect.
> com_science_article_pii_S1532046415001392&d=BQIFaQ&c=qS4goWBT7poplM69z
> y_3xhKwEW14JZMSdioCoppxeFU&r=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5
> WY&m=IdFJ0ChLqz9-dg435_5Rea2_0EUPNDw0uCUKnNp_N7k&s=DOgavsLa7IIU0rgq8lx
> DXTb33J8-4zgCWuKzL83CZyw&e= This is very interesting work and I think
> it would be very valuable for the general community. Is this
> something that you may be in interested in contributing/sharing the
> code with the Apache cTAKES community?
> Thanks,
> Pei
Re: Combining Knowledge- and Data-driven Methods for
De-identification of Clinical Narratives
Posted by Ted Strall <ts...@yahoo.com.INVALID>.
How / when can we go about getting started on this?
From: "Chen, Pei" <Pe...@childrens.harvard.edu>
To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>; Ted Strall <ts...@yahoo.com>
Sent: Friday, July 24, 2015 12:52 PM
Subject: RE: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives
Ted- Welcome to the community!
I think this would be a great enhancement.
Jay- I think the BigTop folks did a lot with the smoke and integration tests... Do you how they did it? Something we can reuse?
--Pei
-----Original Message-----
From: Ted Strall [mailto:tstrall@yahoo.com.INVALID]
Sent: Friday, July 24, 2015 12:31 PM
To: dev@ctakes.apache.org
Subject: Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives
I would be interested in helping to develop / maintain a regression testing framework for that.
I'm new to ctakes (and just recently started stalking the dev mailing list) but I've been a software engineer for 20 years and have done a lot of framework automation stuff that will probably be required. As I write this, I am working on an automated integration test that will run on Jenkins that fires up and load an h2 database, a solr instance, an in-house indexing pipeline and an in-house search service, indexes 10k documents and executes and evaluates some canned queries before shutting itself down.
I'm also working on a MS in Predictive Analytics and I am interested in applying machine learning and NLP to medical informatics, so I would welcome the chance to get dirty with that side of stuff, also.
From: Jay Vyas <ja...@gmail.com>
To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>
Sent: Friday, July 24, 2015 10:44 AM
Subject: Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives
Yes this is very interesting work.
- If we have access to a large corpus of de identified records we can recession test the ctakes platform.
- I can help collaborate on a regression testing framework if someone else wants to help Maintain it.
> On Jul 24, 2015, at 11:12 AM, Pei Chen <ch...@apache.org> wrote:
>
> Hi,
> Re:
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.sciencedirect.
> com_science_article_pii_S1532046415001392&d=BQIFaQ&c=qS4goWBT7poplM69z
> y_3xhKwEW14JZMSdioCoppxeFU&r=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5
> WY&m=IdFJ0ChLqz9-dg435_5Rea2_0EUPNDw0uCUKnNp_N7k&s=DOgavsLa7IIU0rgq8lx
> DXTb33J8-4zgCWuKzL83CZyw&e= This is very interesting work and I think
> it would be very valuable for the general community. Is this
> something that you may be in interested in contributing/sharing the
> code with the Apache cTAKES community?
> Thanks,
> Pei
RE: Combining Knowledge- and Data-driven Methods for
De-identification of Clinical Narratives
Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.
Ted- Welcome to the community!
I think this would be a great enhancement.
Jay- I think the BigTop folks did a lot with the smoke and integration tests... Do you how they did it? Something we can reuse?
--Pei
-----Original Message-----
From: Ted Strall [mailto:tstrall@yahoo.com.INVALID]
Sent: Friday, July 24, 2015 12:31 PM
To: dev@ctakes.apache.org
Subject: Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives
I would be interested in helping to develop / maintain a regression testing framework for that.
I'm new to ctakes (and just recently started stalking the dev mailing list) but I've been a software engineer for 20 years and have done a lot of framework automation stuff that will probably be required. As I write this, I am working on an automated integration test that will run on Jenkins that fires up and load an h2 database, a solr instance, an in-house indexing pipeline and an in-house search service, indexes 10k documents and executes and evaluates some canned queries before shutting itself down.
I'm also working on a MS in Predictive Analytics and I am interested in applying machine learning and NLP to medical informatics, so I would welcome the chance to get dirty with that side of stuff, also.
From: Jay Vyas <ja...@gmail.com>
To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>
Sent: Friday, July 24, 2015 10:44 AM
Subject: Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives
Yes this is very interesting work.
- If we have access to a large corpus of de identified records we can recession test the ctakes platform.
- I can help collaborate on a regression testing framework if someone else wants to help Maintain it.
> On Jul 24, 2015, at 11:12 AM, Pei Chen <ch...@apache.org> wrote:
>
> Hi,
> Re:
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.sciencedirect.
> com_science_article_pii_S1532046415001392&d=BQIFaQ&c=qS4goWBT7poplM69z
> y_3xhKwEW14JZMSdioCoppxeFU&r=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5
> WY&m=IdFJ0ChLqz9-dg435_5Rea2_0EUPNDw0uCUKnNp_N7k&s=DOgavsLa7IIU0rgq8lx
> DXTb33J8-4zgCWuKzL83CZyw&e= This is very interesting work and I think
> it would be very valuable for the general community. Is this
> something that you may be in interested in contributing/sharing the
> code with the Apache cTAKES community?
> Thanks,
> Pei
Re: Combining Knowledge- and Data-driven Methods for
De-identification of Clinical Narratives
Posted by Ted Strall <ts...@yahoo.com.INVALID>.
I would be interested in helping to develop / maintain a regression testing framework for that.
I'm new to ctakes (and just recently started stalking the dev mailing list) but I've been a software engineer for 20 years and have done a lot of framework automation stuff that will probably be required. As I write this, I am working on an automated integration test that will run on Jenkins that fires up and load an h2 database, a solr instance, an in-house indexing pipeline and an in-house search service, indexes 10k documents and executes and evaluates some canned queries before shutting itself down.
I'm also working on a MS in Predictive Analytics and I am interested in applying machine learning and NLP to medical informatics, so I would welcome the chance to get dirty with that side of stuff, also.
From: Jay Vyas <ja...@gmail.com>
To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>
Sent: Friday, July 24, 2015 10:44 AM
Subject: Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives
Yes this is very interesting work.
- If we have access to a large corpus of de identified records we can recession test the ctakes platform.
- I can help collaborate on a regression testing framework if someone else wants to help Maintain it.
> On Jul 24, 2015, at 11:12 AM, Pei Chen <ch...@apache.org> wrote:
>
> Hi,
> Re: http://www.sciencedirect.com/science/article/pii/S1532046415001392
> This is very interesting work and I think it would be very valuable
> for the general community. Is this something that you may be in
> interested in contributing/sharing the code with the Apache cTAKES
> community?
> Thanks,
> Pei
Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives
Posted by Jay Vyas <ja...@gmail.com>.
Yes this is very interesting work.
- If we have access to a large corpus of de identified records we can recession test the ctakes platform.
- I can help collaborate on a regression testing framework if someone else wants to help Maintain it.
> On Jul 24, 2015, at 11:12 AM, Pei Chen <ch...@apache.org> wrote:
>
> Hi,
> Re: http://www.sciencedirect.com/science/article/pii/S1532046415001392
> This is very interesting work and I think it would be very valuable
> for the general community. Is this something that you may be in
> interested in contributing/sharing the code with the Apache cTAKES
> community?
> Thanks,
> Pei