You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ctakes.apache.org by "King, Christopher" <ch...@wustl.edu> on 2018/08/08 20:05:17 UTC

Adding section regex to ccda_sections.txt

Hello, I am trying to use cTAKES to sectionize and NER within sections on MIMIC II notes. The RegexSectionizer seems to be the most applicable (there generally are not section dividers). However, the section labels in ccda_sections.txt are quite limited. For example, it does not detect "HPI" "FH" "PMH" or many other commonly used section identifiers. Other systems include much longer lists. For example,  VU's sectag https://orbit.nlm.nih.gov/browse-repository/software/nlp-information-extraction/negation-resolution/41-sectag provides ~6800 synonyms and variants of section labels. I would be happy to add them to  RegexSectionizer; however, it is not obvious how to generate the anticipated format. The sectag database contains a "tree" (eg 5.28 for "hpi") which is not the same as the HL7-CCDA ID. Some but not all have LOINC id's. Similarly, I am not sure what Clarity is using for its section tree.

1) Does it matter what string I used for the id (other than collisions)? Does anything later actually use the HL7 string?
2) If so, how does one generate the HL7 string?
3) Is there some easier way to do this with eg BsvRegexSectionizer (which is essentially undocumented)?

Thanks, Ryan King

________________________________
The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail.