You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by "Finan, Sean" <Se...@childrens.harvard.edu> on 2017/03/31 15:01:53 UTC

RE: Labs annotator? aside: unit normalization

I'm really glad that we are getting a dialog going on this!

Dave,
I was not aware of that java units effort - cheers for the link!  I have only glanced at the adverts but will delve into it a little more later.  

Peter,
I actually have some code that detects and "opines" on weights and measures - mass v. linear measurement v. volume, etc.   However, it doesn't convert one system to another.  It is just a "dumb" splitter using string lists.  We could definitely use it as a first step to distribute weights and measures to appropriate convertors.

Sean


-----Original Message-----
From: David Kincaid [mailto:kincaid.dave@gmail.com] 
Sent: Friday, March 31, 2017 8:54 AM
To: dev@ctakes.apache.org
Subject: Re: Labs annotator?

We have done some work a while back to standardize weight UOM's and provide conversion services for them (from g to pounds, etc). We used UCUM as the standard representation of the unit and the Java Units of Measurement library (https://urldefense.proofpoint.com/v2/url?u=http-3A__unitsofmeasurement.github.io_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=7PQa2HDlpSjgLDRC7klMvSEN_6knkCllzkTD6BVOprU&s=BoiSlRFEPpBNGuIE0I37FOxbO0o2-oDky2swxk6SuL4&e= ) to work with them in code and do conversions between units. I found the library really well put together and made working with units in Java code very nice.

The biggest challenge was standardizing from the value entered in the EMR (g, grams, kg, kilos, pounds, #, lbs, etc) to the standard UCUM symbol.
Since we were just working with weights we simply hard coded some values with regular expressions. It is working today in a production system.

I am very excited about the potential of getting a labs annotator into cTAKES. This is on my (very long) list of projects to work on as well. So if there is anything I can do to help this along, I am more than willing to help out.

Thanks,

Dave

On Fri, Mar 31, 2017 at 3:46 AM, Abramowitsch, Peter < pabramowitsch@hearst.com> wrote:

> Hi Sean
>
> What I wrote in Java was many years ago and embedded in projects that 
> I no longer have access to.  In C# it also was 6 years ago and 
> embedded in proprietary software.  There are various approaches that might work and I
> might be able to contribute a prototype in a few months.   I've got
> another project on at the moment.  In the meantime, I can map out a 
> number of possible approaches toward that prototype.  For example, an 
> opinionated version would associate a concept with all the legal units 
> of that concept and reject anything that did not match.  Weight in 
> millimeters for example.  The non-opinionated version would just try 
> to disentangle the UOM from the concept value and try to identify & 
> normalize it.  The latter of course would be much less work and 
> require much less maintenance.  What do you think?
>
> Btw could you let me know when enough of the new system is stable 
> enough that I should upgrade to it?  I'm still back at 3.2.2
>
> - Peter
>
> On 3/31/17, 10:02 AM, "Finan, Sean" <Se...@childrens.harvard.edu>
> wrote:
>
> >Hi Peter,
> >
> >It would indeed be good to have some kind of unit normalizer, and 
> >maybe space in the type system to have normalized values and units as 
> >well as the original document text equivalents.  We are doing some 
> >normalization in a project but we lose reference to specific text 
> >mentions.  It is fine for our purposes, but I can see how it wouldn't play well with others.
> >
> >Yes, a little late for 4.0, but it should go on the wishlist for 4.1 
> >.2
> >.3 .4 ...
> >
> >Many thanks,
> >Sean
> >
> >P.s.  would you be able to contribute some of your java conversion 
> >routines?  Even if they aren't uimafied it might be faster to adapt 
> >what you have than to start from scratch.  That goes for unit 
> >detection if you have something better than ctakes.
> >
> >-----Original Message-----
> >From: Abramowitsch, Peter [mailto:pabramowitsch@hearst.com]
> >Sent: Friday, March 31, 2017 3:47 AM
> >To: dev@ctakes.apache.org
> >Subject: Re: Labs annotator?
> >
> >I know I'm probably too late into this discussion, but I think the 
> >extra effort to pull out the unit of measure and normalize it will 
> >make it possible to parse notes where a parameter's units are 
> >described differently from one to the next.  It would be a pain for 
> >each user of Ctakes to have to parse UOMs themselves.  Plus with 
> >normalization it will be possible to do reliable conversions from one 
> >unit to another
> >
> >Since there are hundreds of unit types, one could make the system 
> >take the lazy approach in the sense that it would do its best to 
> >extract and normalize but if it was unable to, then it would leave 
> >the unit attribute blank and just return the combined string.  E.g.
> >
> >Concept Weight
> >getValue() : 45
> >getUnit() : kg
> >getNormalUnit() : kg
> >getNormalValue() : 45
> >
> >Concept Weight
> >
> >getValue() : 99
> >getUnit() : lb
> >getNormalUnit: kg
> >getNormalValue: 45
> >
> >Concept Esinophil
> >
> >getValue() : 2
> >getUnit() : %
> >getNormalUnit: %
> >getNormalValue: 45
> >
> >
> >Concept Lymphocites
> >Here, the unit cells per microliter of blood isn't yet in our unit 
> >dictionary
> >getValue() : 2600 c/ul
> >getUnit() : NULL
> >getNormalUnit: NULL
> >getNormalValue: NULL
> >
> >Or something like this.
> >
> >I don't know how many times I've had to write conversion routines to 
> >do this in Java, C#, and Ruby... Painful.
> >
> >Peter
> >
> >
> >
> >On 3/30/17, 7:05 PM, "Finan, Sean" <Se...@childrens.harvard.edu>
> >wrote:
> >
> >>Hi Kean,
> >>
> >>>org.apache.ctakes.typesystem.type.textsem.LabMention,
> >>whose labValue is a ResultOfTextRelation... which I don't see 
> >>actually used in the codebase...
> >>
> >>Yeah, the type system can be a little bit of a jumble, especially 
> >>without a history lesson and because of a paucity of examples in the 
> >>code ...
> >>things like LabMention being a proper subclass and subtype, but other
> >>things like ResultOfT.R. being a subtype ... But you have it right.   For
> >>LabMention.getLabValue() there is a forced return type of "ResultOf"
> >>enforced by the return class ResultOfT.R., even though "subclass"
> >>ResultOfT.R. has no methods other than those of its parent class ...
> >>As I said, it begs for a discussion on the history of uima and 
> >>ctakes and automatic type generation, etc.
> >>
> >>>but whose arg1 seems as if it should be (a RelationArgument 
> >>>wrapping) the lab concept annotation (e.g. "weight" or "Albumin"), 
> >>>arg2 the value annotation (e.g. "46 kg" or "2.2").
> >>
> >>This question of element-attribute ordering can also be a matter of 
> >>confusion.  Others (Tim, Dima, want to weigh in?) might advise 
> >>otherwise, but I think that there is a fiwo (first in wins out)  
> >>matter here.
> >>"weight", "46kg" is fine until consensus says otherwise.  This is 
> >>why ResultOfT.R. should probably be a proper subclass with 
> >>getAction()
> >>getResult() methods that delegate to arg1 and arg2, which can be 
> >>refactored/switched at any point in the future ...  but meanwhile 
> >>developers would have an obvious hint as to what is what ... I 
> >>should really write a wishlist of ctakes refactorings ...
> >>
> >>And stop writing.  Sorry, it has been an interesting couple of weeks 
> >>for me.
> >>That was a bit long-winded.  In summary I think that your 
> >>understanding is correct.
> >>
> >>Thanks,
> >>Sean
> >>
> >>
> >>-----Original Message-----
> >>From: Kean Kaufmann [mailto:kean@recordsone.com]
> >>Sent: Thursday, March 30, 2017 9:16 AM
> >>To: dev@ctakes.apache.org
> >>Subject: Re: Labs annotator?
> >>
> >>Sean & Pei -- Glad it can be useful!  Haven't done this before, so 
> >>please bear with me.
> >>
> >>Pei:
> >>I've signed up at issues.apache.org; I'll open an issue and attach 
> >>some code sometime in the next week or so.
> >>Is Java 8 ok?
> >>
> >>Sean:
> >>Our type-system tweaks are oversimplifications; I didn't wrap my 
> >>head around the relation-extraction KR.  May I sanity-check with you?
> >>
> >>So, let's see... there's
> >>org.apache.ctakes.typesystem.type.textsem.LabMention,
> >>whose labValue is a ResultOfTextRelation... which I don't see 
> >>actually used in the codebase... but whose arg1 seems as if it 
> >>should be (a RelationArgument wrapping) the lab concept annotation 
> >>(e.g. "weight" or "Albumin"), arg2 the value annotation (e.g. "46 kg" or "2.2").
> >>
> >>Does that seem right? If my understanding is incomplete/incorrect, 
> >>please let me know.
> >>
> >>Thanks,
> >>-Kean
> >>
> >>
> >>On Wed, Mar 29, 2017 at 9:32 AM, Pei Chen <ch...@apache.org> wrote:
> >>
> >>> Kean,
> >>> This would be really useful.  If you would like make a 
> >>> contribution, could you please open a Jira and attach the patch or 
> >>> code?  When you submit a patch via jira/attachment, it has legal 
> >>> verberage about donating the code, etc.
> >>>
> >>> --Pei
> >>>
> >>>
> >>> On Wed, Mar 29, 2017 at 9:30 AM, Finan, Sean 
> >>> <Se...@childrens.harvard.edu> wrote:
> >>> > Fantastic!
> >>> >
> >>> > I would really like to work with you to get this into ctakes 4.1.
> >>> > Let
> >>> me know how you would like to proceed.  Would you like to send me 
> >>> or another committer the code or have somebody review it remotely?  
> >>> The "tweaks" may be something useful to ctakes, but if not I'm 
> >>> sure that we can create a decent interfacing.
> >>> >
> >>> > Cheers,
> >>> > Sean
> >>> >
> >>> > -----Original Message-----
> >>> > From: Kean Kaufmann [mailto:kean@recordsone.com]
> >>> > Sent: Wednesday, March 29, 2017 7:59 AM
> >>> > To: dev@ctakes.apache.org
> >>> > Subject: Re: Labs annotator?
> >>> >
> >>> >>
> >>> >> I'm sure that people would love to see lab values in ctakes!
> >>> >> Could you please write a small summary of what it does?  Maybe 
> >>> >> an example or two could suffice.
> >>> >
> >>> >
> >>> > Hi Sean,
> >>> >
> >>> > The labs annotator identifies likely lab phrases by TUI (T059 et 
> >>> > al.),
> >>> and relates them to the nearest following number-ish value -- 
> >>> NumToken, FractionAnnotation, MeasurementAnnotation or (as a last
> >>> resort) RangeAnnotation -- that isn't part of a Date or TimeAnnotation.
> >>> > A whitelist of lab-value words can also be specified,  e.g.
> >>> > "positive",
> >>> "negative", "normal", "elevated", "decreased", ...
> >>> >
> >>> > For example,
> >>> >
> >>> > Weight / BMI:  Recent weight (as of 05/05/16) is
> >>> >> 45.36 kg (100 lb)
> >>> >
> >>> >
> >>> > yields
> >>> >
> >>> > "weight" -> "45.36 kg"
> >>> >
> >>> > and
> >>> >
> >>> > HEPATIC FUNCTION PANEL
> >>> >>     Result Value Ref Range
> >>> >>  Albumin 2.2 (*) 3.7 - 5.1 g/dL  Total Protein 5.5 (*) 5.8 - 
> >>> >> 8.0 g/dL  Alkaline Phosphatase 844
> >>> >> (*)
> >>> >> 42 - 121 IU/L ...
> >>> >
> >>> >
> >>> > yields
> >>> >
> >>> > "Albumin" -> "2.2"
> >>> > "Protein" -> "5.5"
> >>> > "Alkaline Phosphatase" -> "844"
> >>> >
> >>> > (without trying to fill in the units or referenceRangeNarrative
> >>>values).
> >>> >
> >>> > Configuration parameters:
> >>> > * ids of segments to annotate
> >>> > * TUIs indicating labs - I use T059, T060 and T121
> >>> > * CUIs too general to be useful, e.g. C1443182, "Calculated
> >>>(procedure)"
> >>> > * Whitelist of words allowed as lab values
> >>> > * Maximum number of newlines permitted between lab and value (0 
> >>> > = must
> >>> be on same line)
> >>> >
> >>> > I'd need to check in with you to make sure it plays nicely with 
> >>> > the
> >>> cTAKES type system; we've tweaked ours a bit.
> >>> >
> >>> > Best,
> >>> > -kk
> >>> >
> >>> >
> >>> > On Tue, Mar 28, 2017 at 11:45 AM, Finan, Sean <
> >>> Sean.Finan@childrens.harvard.edu> wrote:
> >>> >
> >>> >> Hi Kean,
> >>> >>
> >>> >> I'm sure that people would love to see lab values in ctakes!
> >>> >> Could you please write a small summary of what it does?  Maybe 
> >>> >> an example or two could suffice.
> >>> >>
> >>> >> We can definitely put it into ctakes in release 4.1 - maybe 
> >>> >> next
> >>> quarter?
> >>> >>
> >>> >> Cheers,
> >>> >> Sean
> >>> >>
> >>> >> -----Original Message-----
> >>> >> From: Kean Kaufmann [mailto:kean@recordsone.com]
> >>> >> Sent: Tuesday, March 28, 2017 11:34 AM
> >>> >> To: dev@ctakes.apache.org
> >>> >> Subject: Labs annotator?
> >>> >>
> >>> >> On Tue, Mar 28, 2017 at 11:23 AM, Finan, Sean < 
> >>> >> Sean.Finan@childrens.harvard.edu> wrote:
> >>> >>
> >>> >> >
> >>> >> > If anybody out there has something that they would like to 
> >>> >> > contribute to ctakes, please do!
> >>> >> >
> >>> >>
> >>> >> I recently wrote an annotator for lab values.  There was some 
> >>> >> discussion of this on the dev list a couple of years ago; did 
> >>> >> anything
> >>> come of it?
> >>> >> Happy to contribute if it's helpful.
> >>> >>
> >>> >> --
> >>> >> _____________________________________________________
> >>> >> *Kean Kaufmann*
> >>> >> NLP Developer
> >>> >>
> >>> >> RecordsOne
> >>> >>   nSight Driven | *Priority. Clarity. Integrity. *
> >>> >>
> >>> >> *mobile* |
> >>> >> 240-401-6131
> >>> >>
> >>> >> *Twitter:  **@R1_RecordsOne*
> >>> >> *See us in Vegas @ ACDIS 2017 * *See us in Los Angeles @ AHIMA 
> >>> >> 2017*
> >>> >>
> >>> >> ------------------------------------------------------------
> >>> >> ---------------------------------------
> >>> >> *Confidentiality Notice:  *This email, including any 
> >>> >> attachments is the property of RecordsOne, LLC and is intended 
> >>> >> for the sole use of the intended recipient(s). It may contain 
> >>> >> information that is privileged and confidential. Any 
> >>> >> unauthorized review, use, disclosure, or distribution is 
> >>> >> prohibited. If you are not the intended recipient, please reply 
> >>> >> to the sender that you have received the message in error, then delete this message.
> >>> >> ------------------------------------------------------------
> >>> >> ---------------------------------------
> >>> >> *Mailing*:  10641 Airport Pulling Road, Suite 30 | Naples, FL
> >>> >> 34109
> >>> >> *Main*:  239.451.6112
> >>> >>
> >>> >> *Please consider the environmental impact before printing this
> >>>email.
> >>> >> *
> >>> >>
> >>>
> >
>
>