You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by "Savova, Guergana" <Gu...@childrens.harvard.edu> on 2020/06/29 15:33:36 UTC

RE: ApacheCon 2020 and cTAKES

Hi Sean,

Thank you for bringing ApacheCon to the attention of cTAKES-ers!

In my opinion, your list of ideas for presentations/videos catches topics of high interest in our community that we have a seen many discussions on in the cTAKES lists. Thank you for volunteering to be the point of contact!

It is a short two week timeline, but we as a community can pull it off.

Looking forward to engaging discussions on the list. I am including the user list as well as there are many there who might be interested.

--
Guergana Savova, PhD, FACMI
Associate Professor
PI Natural Language Processing Lab
Computational Health Informatics Program (CHIP)
Boston Children's Hospital and Harvard Medical School
401 Park, 5th floor East, 5523.3
Boston, MA 02215
Tel: (617) 919-2972
Fax: (617) 730-0817
Guergana.Savova@childrens.harvard.edu


-----Original Message-----
From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu] 
Sent: Monday, June 29, 2020 11:02 AM
To: dev@ctakes.apache.org
Subject: ApacheCon 2020 [Bulk] [EXTERNAL] [SUSPICIOUS] [Bulk]

* External Email - Caution *


Hi all,


General admission to ApacheCon 2020 is free:  https://urldefense.proofpoint.com/v2/url?u=https-3A__hopin.to_events_apachecon-2Dhome&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=yU_agaYe-PZHfO7KaS_wI1oIHZ9S2WZ6mlFRuPuGX-w&s=iNzRSD7w2OIaoya3gcxVg3TN3e1uZZnaTfnLbPIH13A&e= 


I think that price of admission and travel costs have held back ctakes users from attending past conferences, and lack of a sizable audience has diminished the comparative value of ctakes presentations in the eyes of ApacheCon planners.  Because of the "at home" nature of this year's conference, an app with smaller presence and less hip buzz has a better chance of grabbing some time on the schedule.


The predetermined tracks are still an ill fit when it comes to the nature of ctakes.  https://urldefense.proofpoint.com/v2/url?u=https-3A__apachecon.com_acah2020_cfp.html&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m=yU_agaYe-PZHfO7KaS_wI1oIHZ9S2WZ6mlFRuPuGX-w&s=NzjDAyMTCLL62RKHfhr4dnMgGZTDFgB3X92YlqwPUEY&e= 

However, I think that we can still use this opportunity to deliver some powerful introduction and training videos, as well as user stories and clinical project application.  Perhaps we can argue for a NLP track and do some coordination with projects like OpenNLP and UIMA.


There are a scant two weeks to come up with presentations, and less time to propose a track/topic.  The call for presentations ends July 13th.  That is a deadline that requires immediate attention by anybody who wants to show off their project or expertise.


Apache wants to have a single point of contact for each project, and I am volunteering to be that person for ctakes.   I am volunteering, not laying claim, so if you think that you are a better fit for the position please let me know.


I have written some ideas for presentations below.  If you want to take one (modify as you like) then please write me and post to the devlist.  If you have ideas for another presentation topic, please let me and the devlist know - even if you aren't volunteering to do the presentation yourself perhaps somebody else will.    Again ... two weeks.​


Thank you,

Sean



*  The following talk ideas are by and large directed toward training.  That does not mean that topics should stay within that scope.


=================================================================


Customizing cTAKES: First Principles

Built using Apache UIMA, cTAKES is modular and extensible.  Why is it frequently treated as a black box?  Is it lack of need, sparsity of resources, or simply fear of the unknown?

This is a quick start tutorial on adding custom elements to cTAKES.  We illustrate creating simple classes to input, process and output data.  This involves a concise overview of Apache uimaFIT and the cTAKES type system, as well as building a UIMA pipeline using piper files.


=================================================================


Loading a shippable with cTAKES DockHand

Customizing a simple pipeline need not be left to cTAKES experts.  Making a cTAKES installation need not be confined to source code checkouts or lengthy multi-stage binary downloads.

We introduce cTAKES DockHand, a compact single-file installation tool that allows one to construct custom pipelines as well as local installations, Rest Services and Dockerfiles.


==================================================================


Secret Engines of cTAKES

The cTAKES default natural language processing pipeline is a standard in the clinical research community.  What is past that standard?  While the default clinical pipeline uses almost 20 engines, there are dozens more in various cTAKES modules.

We present and discuss the top 10 annotation engines you never knew you had.


====================================================


Does cTAKES Know "The Best Words"?

Named Entity Recognition is at the core of all complete natural language processing tools.  Out of the box cTAKES uses a dictionary containing part of the Unified Medical Language System (UMLS) that covers most common clinical terms.  But it also comes with a custom dictionary creator.

If you think that your clinical research is directed, then you should probably have a directed dictionary.  UMLS subsets, non-english dictionaries and novel custom dictionaries have all been successfully used with cTAKES.

This is an overview of cTAKES named entity recognition with the essential what, why and how of custom dictionaries as the centerpiece.


====================================================

Academic Software: Performance or Performance?

A conundrum faced by all academic software projects is how to make the best of a small amount of resources.  Clinical natural language processing projects that use cTAKES are not exempt, and balancing accuracy of results against speed of processing often becomes central when it is time to put things into production (or just please the boss).

More than a history of cTAKES and its evolutionary efforts in precision, speed and usability, this presentation contains examples of how to best utilize each aspect.


================================================================


Diet cTAKES

One reason cTAKES is a popular framework in clinical natural language processing tools is its use of Apache Maven for project management.  Navigating cTAKES dependencies can be difficult, leading to a common practice of consuming the whole project.  Much of what ends up in your system may lead to unnecessary bloat.

Going piecemeal through the values and weights of cTAKES modules and resources, this presentation will assist any cTAKES user in trimming project bulk from gigabytes to megabytes.


================================================================


cTAKES Saved my Life

The title is inappropriate when it comes to healthcare in practice.  However, I used Apache cTAKES for my clinical research project on ________, and its [versatile / comprehensive / speedy / ?] nature was important in completing things [on time /  accurately / ?].

We share our real-world experiences with using cTAKES, discuss why we chose it, issues we faced and how we overcame unexpected problems.


================================================================


Large-scale cTAKES, an Installation Story

At our _____ facility, we needed to process _____ [patients / notes / term lists / ?] on a ______ system.

We present a real-world application of cTAKES on a large scale, our needs for _____ input and ____ output.  We compare and contrast cTAKES with other [clinical] NLP platforms that we tried and explain why we chose [it / another] in the end.

We will also share the novel [techniques / code / integration] that we used for the success of our installation.


================================================================


My Engine is Faster than Yours

We have created a cTAKES annotation engine that performs the task of _____.   This is [newer / faster / more comprehensive] than existing engines in [cTAKES / other].

We will present [numbers , usage , capabilities / i/o ] of the engine and its [model / data ].
We will also commit the code and documentation to Apache cTAKES.


================================================================


cTAKES on the Catwalk

We have created a Machine Learning model that can be used in cTAKES for ______.  The model uses the third party ______ for [newer / faster / more comprehensive] results.

We will present the essentials of model creation as well as [numbers , usage , capabilities / i/o ] of our model.   We will also advocate for the third party _____ and how we integrated it with cTAKES.
We will also commit the code [model] and documentation to Apache cTAKES.