You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ctakes.apache.org by Tim Miller <ti...@childrens.harvard.edu> on 2013/02/04 22:43:55 UTC

assistance with dictionary lookup issue

Pei helped me track down an issue with performance I'd noticed in the 
dictionary annotator, and I have filed the issue here: 
https://issues.apache.org/jira/browse/CTAKES-143

I implemented a quick and dirty proof of concept fix and noticed 
dramatic performance improvement.  I attached the patch to the issue, 
but it involves changing an interface (currently does not try to fix 
other implementing classes so obviously not ready for primetime), so I 
wanted to solicit the list first in case anyone with better knowledge of 
that module has some better engineering ideas than what I came up with.

Thanks,

-- 
Tim Miller, PhD
Postdoctoral Research Fellow
Children's Hospital Informatics Program
Children's Hospital Boston and Harvard Medical School
617-919-1223

RE: assistance with dictionary lookup issue

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.

My two cents, for what they are worth ...

You never want to rely upon an implicit contract between developer and code.  What I mean is that you never know how another developer (or you at a later time) might use something.  If a method can receive a parameter of a type other than required by that method, then you must assume that at some point it will.  Unless we utilize a subclass "SortedXXX" or flag "isSorted()" or something of that kind, random ordering must be assumed and dealt with - even if it may lead to redundant sorts.  If the items are assumed to be sorted already, then an insertion sort is ok.

My vote is to leave the sort in there.

-----Original Message-----
From: Tim Miller [mailto:timothy.miller@childrens.harvard.edu] 
Sent: Monday, February 04, 2013 5:35 PM
To: ctakes-dev@incubator.apache.org
Subject: Re: assistance with dictionary lookup issue

What do we know about under what circumstances an annotation will be sorted?

On 02/04/2013 05:01 PM, Masanz, James J. wrote:
> I'll take a look at the patch. Also be aware of https://issues.apache.org/jira/browse/CTAKES-31 which talks about a way of enhancing performance  -- if willing to assume annotations (BaseTokens currently) are sorted. Currently it's always BaseToken and always sorted, just not sure if we want to code to that assumption.
>
> ________________________________________
> From: 
> ctakes-dev-return-1137-Masanz.James=mayo.edu@incubator.apache.org 
> [ctakes-dev-return-1137-Masanz.James=mayo.edu@incubator.apache.org] on 
> behalf of Tim Miller [timothy.miller@childrens.harvard.edu]
> Sent: Monday, February 04, 2013 3:43 PM
> To: ctakes-dev@incubator.apache.org
> Subject: assistance with dictionary lookup issue
>
> Pei helped me track down an issue with performance I'd noticed in the 
> dictionary annotator, and I have filed the issue here:
> https://issues.apache.org/jira/browse/CTAKES-143
>
> I implemented a quick and dirty proof of concept fix and noticed 
> dramatic performance improvement.  I attached the patch to the issue, 
> but it involves changing an interface (currently does not try to fix 
> other implementing classes so obviously not ready for primetime), so I 
> wanted to solicit the list first in case anyone with better knowledge 
> of that module has some better engineering ideas than what I came up with.
>
> Thanks,
>
> --
> Tim Miller, PhD
> Postdoctoral Research Fellow
> Children's Hospital Informatics Program Children's Hospital Boston and 
> Harvard Medical School
> 617-919-1223

RE: assistance with dictionary lookup issue

Posted by "Masanz, James J." <Ma...@mayo.edu>.

I don't know enough. I considered suggesting as a quick way to get a performance improvement to check if the type is BaseToken and then assume sorted. I know that's not pretty, but given the gain it seemed wothwhile to suggest - just hadn't had a chance to get to posting something about that already and created that JIRA issue so it wasn't totally forgotten.

-- James


> -----Original Message-----
> From: ctakes-dev-return-1139-Masanz.James=mayo.edu@incubator.apache.org
> [mailto:ctakes-dev-return-1139-Masanz.James=mayo.edu@incubator.apache.org]
> On Behalf Of Tim Miller
> Sent: Monday, February 04, 2013 4:34 PM
> To: ctakes-dev@incubator.apache.org
> Subject: Re: assistance with dictionary lookup issue
> 
> What do we know about under what circumstances an annotation will be
> sorted?
> 
> On 02/04/2013 05:01 PM, Masanz, James J. wrote:
> > I'll take a look at the patch. Also be aware of
> https://issues.apache.org/jira/browse/CTAKES-31 which talks about a way of
> enhancing performance  -- if willing to assume annotations (BaseTokens
> currently) are sorted. Currently it's always BaseToken and always sorted,
> just not sure if we want to code to that assumption.
> >
> > ________________________________________
> > From:
> > ctakes-dev-return-1137-Masanz.James=mayo.edu@incubator.apache.org
> > [ctakes-dev-return-1137-Masanz.James=mayo.edu@incubator.apache.org] on
> > behalf of Tim Miller [timothy.miller@childrens.harvard.edu]
> > Sent: Monday, February 04, 2013 3:43 PM
> > To: ctakes-dev@incubator.apache.org
> > Subject: assistance with dictionary lookup issue
> >
> > Pei helped me track down an issue with performance I'd noticed in the
> > dictionary annotator, and I have filed the issue here:
> > https://issues.apache.org/jira/browse/CTAKES-143
> >
> > I implemented a quick and dirty proof of concept fix and noticed
> > dramatic performance improvement.  I attached the patch to the issue,
> > but it involves changing an interface (currently does not try to fix
> > other implementing classes so obviously not ready for primetime), so I
> > wanted to solicit the list first in case anyone with better knowledge
> > of that module has some better engineering ideas than what I came up
> with.
> >
> > Thanks,
> >
> > --
> > Tim Miller, PhD
> > Postdoctoral Research Fellow
> > Children's Hospital Informatics Program Children's Hospital Boston and
> > Harvard Medical School
> > 617-919-1223

Re: assistance with dictionary lookup issue

Posted by Tim Miller <ti...@childrens.harvard.edu>.

What do we know about under what circumstances an annotation will be sorted?

On 02/04/2013 05:01 PM, Masanz, James J. wrote:
> I'll take a look at the patch. Also be aware of https://issues.apache.org/jira/browse/CTAKES-31 which talks about a way of enhancing performance  -- if willing to assume annotations (BaseTokens currently) are sorted. Currently it's always BaseToken and always sorted, just not sure if we want to code to that assumption.
>
> ________________________________________
> From: ctakes-dev-return-1137-Masanz.James=mayo.edu@incubator.apache.org [ctakes-dev-return-1137-Masanz.James=mayo.edu@incubator.apache.org] on behalf of Tim Miller [timothy.miller@childrens.harvard.edu]
> Sent: Monday, February 04, 2013 3:43 PM
> To: ctakes-dev@incubator.apache.org
> Subject: assistance with dictionary lookup issue
>
> Pei helped me track down an issue with performance I'd noticed in the
> dictionary annotator, and I have filed the issue here:
> https://issues.apache.org/jira/browse/CTAKES-143
>
> I implemented a quick and dirty proof of concept fix and noticed
> dramatic performance improvement.  I attached the patch to the issue,
> but it involves changing an interface (currently does not try to fix
> other implementing classes so obviously not ready for primetime), so I
> wanted to solicit the list first in case anyone with better knowledge of
> that module has some better engineering ideas than what I came up with.
>
> Thanks,
>
> --
> Tim Miller, PhD
> Postdoctoral Research Fellow
> Children's Hospital Informatics Program
> Children's Hospital Boston and Harvard Medical School
> 617-919-1223

Re: assistance with dictionary lookup issue

Posted by Tim Miller <ti...@childrens.harvard.edu>.

New patch with these changes attached to ctakes-143.
Tim

On 02/05/2013 11:14 AM, Finan, Sean wrote:
> Looks good to me.
>
> -----Original Message-----
> From: Masanz, James J. [mailto:Masanz.James@mayo.edu]
> Sent: Tuesday, February 05, 2013 11:09 AM
> To: 'ctakes-dev@incubator.apache.org'
> Subject: RE: assistance with dictionary lookup issue
>
> Yes, that's what I had in mind, with a future work item to consider doing the same improvement to the other LookupInitializer(s) some day.
>
> -- James
>
>> -----Original Message-----
>> From:
>> ctakes-dev-return-1148-Masanz.James=mayo.edu@incubator.apache.org
>> [mailto:ctakes-dev-return-1148-Masanz.James=mayo.edu@incubator.apache.
>> org]
>> On Behalf Of Tim Miller
>> Sent: Tuesday, February 05, 2013 10:05 AM
>> To: ctakes-dev@incubator.apache.org
>> Subject: Re: assistance with dictionary lookup issue
>>
>> OK thanks, that clarifies it (including Sean's concern) for me.  I
>> think calling it getSortedLookupTokens is the right idea.  Then it
>> makes it clear to the implementing class that at some point sorting
>> has to be done, and if its not implicit/free in the implementation (as
>> it is if you base it on Annotations), then it needs to be explicit.
>> Since that can be a substantial performance penalty, putting the onus
>> on the implementation will hopefully lead to better implementations.
>>
>> So to summarize the changes to make sure we're on the same page:
>>
>>    * the interface will add a method:
>>
>>               public List getSortedLookupTokens(JCas, Annotation);
>>
>>    * getLookupTokenIterator() will be reverted to its old version.
>>
>>    * FirstTokenPermLookupInitializerImpl will have its
>>      getLookupTokenIterator method reverted, and my changes
>>
>> will go in the implementation of getSortedLookupTokens()   modulo the
>> unnecessary iterator inside.
>>
>>    * DictionaryLookupAnnotator will be changed to call
>>      getSortedLookupTokens instead of
>>      getLookupTokenIterator/constrainToWindow
>>    * Other LookupInitializer's will be implemented to simply call
>>      existing getLookupTokenIterator and do post-processing to constrain
>>      to the window (?)
>>
>> Does this match what you had in mind James?  Any objections or things
>> I'm missing anyone?
>>
>>
>> On 02/05/2013 10:38 AM, Masanz, James J. wrote:
>>> First, about the loop - I had been looking too quickly at the diff
>>> and
>> didn't notice the logic about punctuation etc
>>> Second, what I remember when I looked at it before, was seeing
>>> interface
>> named LookupInitializer, which being old enough, doesn't have Iterator
>> parameterized in the definition of the getLookupTokenIterator:
>>>       public Iterator getLookupTokenIterator(JCas jcas)
>>>               throws AnnotatorInitializationException;
>>>
>>> and that ends up being effectively an Iterator<LookupToken> and
>> LookupToken does not inherit from Annotation, and I stopped at that point.
>>> But now looking farther, it looks to me that that's fine because in
>> FirstTokenPermLookupInitializerImpl, we look through the BaseTokens
>> and create the list of LookupTokens based on the (sorted) BaseTokens,
>> so the LookupTokens will be sorted too.
>>> so maybe we should call the new method getSortedLookupTokens to make
>>> it
>> clear they too are sorted
>>> ________________________________________
>>> From:
>>> ctakes-dev-return-1146-Masanz.James=mayo.edu@incubator.apache.org
>> [ctakes-dev-return-1146-Masanz.James=mayo.edu@incubator.apache.org] on
>> behalf of Tim Miller [timothy.miller@childrens.harvard.edu]
>>> Sent: Tuesday, February 05, 2013 9:10 AM
>>> To: ctakes-dev@incubator.apache.org
>>> Subject: Re: assistance with dictionary lookup issue
>>>
>>> Yeah, if you mean just change the loop to iterate over the list
>>> instead of getting an iterator that makes sense.  There is still
>>> some logic in there to leave out punctuation tokens but I think you
>>> were implying that to be in your mockup diff.
>>>
>>> As for sorting, the AnnotationIndex defines a sort order for its
>> iterators:
>>> http://uima.apache.org/d/uimaj-
>> 2.4.0/apidocs/org/apache/uima/cas/text/AnnotationIndex.html
>>> so we are safe assuming that anything extending Annotation will be
>>> iterated in sorted order.  Does that answer the questions we had? Or
>>> was I missing something subtle in that discussion?
>>>
>>> Tim
>>>
>>> On 02/05/2013 09:44 AM, Masanz, James J. wrote:
>>>> Looks good to me, with one question.
>>>>
>>>> Instead of getting an iterator and then building a new list, can we
>> just skip getting the iterator and use the list that selectCovered
>> returns?
>>>> I will mock up a diff here of what I mean:
>>>> -     Iterator btaItr = org.uimafit.util.JCasUtil.selectCovered(jcas,
>> BaseToken.class, covering).iterator();
>>>> -     while (btaItr.hasNext())
>>>> -             {
>>>> -                     BaseToken bta = (BaseToken) btaItr.next();
>>>> -                             ltList.add(lt);
>>>> -                     }
>>>> -             }
>>>>
>>>> +     ltList = org.uimafit.util.JCasUtil.selectCovered(jcas,
>> BaseToken.class, covering);
>>>>         return ltList;
>>>>
>>>> I know you said it was quick and dirty at the moment - my 2 cents -
>> unless someone comes up with a better engineered solution, I think we
>> could add the new method (with a name like getLookupTokens) and leave
>> the old one so we don't have to deprecate anything. And phase in the
>> change to the various *LookupInitializerImpl classes if needed.
>>>> -- James
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: ctakes-dev-return-1138-
>> Masanz.James=mayo.edu@incubator.apache.org
>>>>> [mailto:ctakes-dev-return-1138-
>> Masanz.James=mayo.edu@incubator.apache.org]
>>>>> On Behalf Of Masanz, James J.
>>>>> Sent: Monday, February 04, 2013 4:01 PM
>>>>> To: ctakes-dev@incubator.apache.org
>>>>> Subject: RE: assistance with dictionary lookup issue
>>>>>
>>>>> I'll take a look at the patch. Also be aware of
>>>>> https://issues.apache.org/jira/browse/CTAKES-31 which talks about
>>>>> a
>> way of
>>>>> enhancing performance  -- if willing to assume annotations
>>>>> (BaseTokens
>>>>> currently) are sorted. Currently it's always BaseToken and always
>> sorted,
>>>>> just not sure if we want to code to that assumption.
>>>>>
>>>>> ________________________________________
>>>>> From: ctakes-dev-return-1137-
>> Masanz.James=mayo.edu@incubator.apache.org
>>>>> [ctakes-dev-return-1137-Masanz.James=mayo.edu@incubator.apache.org
>>>>> ] on behalf of Tim Miller [timothy.miller@childrens.harvard.edu]
>>>>> Sent: Monday, February 04, 2013 3:43 PM
>>>>> To: ctakes-dev@incubator.apache.org
>>>>> Subject: assistance with dictionary lookup issue
>>>>>
>>>>> Pei helped me track down an issue with performance I'd noticed in
>>>>> the dictionary annotator, and I have filed the issue here:
>>>>> https://issues.apache.org/jira/browse/CTAKES-143
>>>>>
>>>>> I implemented a quick and dirty proof of concept fix and noticed
>> dramatic
>>>>> performance improvement.  I attached the patch to the issue, but
>>>>> it involves changing an interface (currently does not try to fix
>>>>> other implementing classes so obviously not ready for primetime),
>>>>> so I
>> wanted to
>>>>> solicit the list first in case anyone with better knowledge of
>>>>> that
>> module
>>>>> has some better engineering ideas than what I came up with.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> --
>>>>> Tim Miller, PhD
>>>>> Postdoctoral Research Fellow
>>>>> Children's Hospital Informatics Program Children's Hospital Boston
>>>>> and Harvard Medical School
>>>>> 617-919-1223

RE: assistance with dictionary lookup issue

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.

Looks good to me. 

-----Original Message-----
From: Masanz, James J. [mailto:Masanz.James@mayo.edu] 
Sent: Tuesday, February 05, 2013 11:09 AM
To: 'ctakes-dev@incubator.apache.org'
Subject: RE: assistance with dictionary lookup issue

Yes, that's what I had in mind, with a future work item to consider doing the same improvement to the other LookupInitializer(s) some day.

-- James

> -----Original Message-----
> From: 
> ctakes-dev-return-1148-Masanz.James=mayo.edu@incubator.apache.org
> [mailto:ctakes-dev-return-1148-Masanz.James=mayo.edu@incubator.apache.
> org]
> On Behalf Of Tim Miller
> Sent: Tuesday, February 05, 2013 10:05 AM
> To: ctakes-dev@incubator.apache.org
> Subject: Re: assistance with dictionary lookup issue
> 
> OK thanks, that clarifies it (including Sean's concern) for me.  I 
> think calling it getSortedLookupTokens is the right idea.  Then it 
> makes it clear to the implementing class that at some point sorting 
> has to be done, and if its not implicit/free in the implementation (as 
> it is if you base it on Annotations), then it needs to be explicit.  
> Since that can be a substantial performance penalty, putting the onus 
> on the implementation will hopefully lead to better implementations.
> 
> So to summarize the changes to make sure we're on the same page:
> 
>   * the interface will add a method:
> 
>              public List getSortedLookupTokens(JCas, Annotation);
> 
>   * getLookupTokenIterator() will be reverted to its old version.
> 
>   * FirstTokenPermLookupInitializerImpl will have its
>     getLookupTokenIterator method reverted, and my changes
> 
> will go in the implementation of getSortedLookupTokens()   modulo the
> unnecessary iterator inside.
> 
>   * DictionaryLookupAnnotator will be changed to call
>     getSortedLookupTokens instead of
>     getLookupTokenIterator/constrainToWindow
>   * Other LookupInitializer's will be implemented to simply call
>     existing getLookupTokenIterator and do post-processing to constrain
>     to the window (?)
> 
> Does this match what you had in mind James?  Any objections or things 
> I'm missing anyone?
> 
> 
> On 02/05/2013 10:38 AM, Masanz, James J. wrote:
> > First, about the loop - I had been looking too quickly at the diff 
> > and
> didn't notice the logic about punctuation etc
> >
> > Second, what I remember when I looked at it before, was seeing 
> > interface
> named LookupInitializer, which being old enough, doesn't have Iterator 
> parameterized in the definition of the getLookupTokenIterator:
> >      public Iterator getLookupTokenIterator(JCas jcas)
> >              throws AnnotatorInitializationException;
> >
> > and that ends up being effectively an Iterator<LookupToken> and
> LookupToken does not inherit from Annotation, and I stopped at that point.
> >
> > But now looking farther, it looks to me that that's fine because in
> FirstTokenPermLookupInitializerImpl, we look through the BaseTokens 
> and create the list of LookupTokens based on the (sorted) BaseTokens, 
> so the LookupTokens will be sorted too.
> >
> > so maybe we should call the new method getSortedLookupTokens to make 
> > it
> clear they too are sorted
> >
> > ________________________________________
> > From: 
> > ctakes-dev-return-1146-Masanz.James=mayo.edu@incubator.apache.org
> [ctakes-dev-return-1146-Masanz.James=mayo.edu@incubator.apache.org] on 
> behalf of Tim Miller [timothy.miller@childrens.harvard.edu]
> > Sent: Tuesday, February 05, 2013 9:10 AM
> > To: ctakes-dev@incubator.apache.org
> > Subject: Re: assistance with dictionary lookup issue
> >
> > Yeah, if you mean just change the loop to iterate over the list 
> > instead of getting an iterator that makes sense.  There is still 
> > some logic in there to leave out punctuation tokens but I think you 
> > were implying that to be in your mockup diff.
> >
> > As for sorting, the AnnotationIndex defines a sort order for its
> iterators:
> > http://uima.apache.org/d/uimaj-
> 2.4.0/apidocs/org/apache/uima/cas/text/AnnotationIndex.html
> >
> > so we are safe assuming that anything extending Annotation will be 
> > iterated in sorted order.  Does that answer the questions we had? Or 
> > was I missing something subtle in that discussion?
> >
> > Tim
> >
> > On 02/05/2013 09:44 AM, Masanz, James J. wrote:
> >> Looks good to me, with one question.
> >>
> >> Instead of getting an iterator and then building a new list, can we
> just skip getting the iterator and use the list that selectCovered 
> returns?
> >>
> >> I will mock up a diff here of what I mean:
> >> -     Iterator btaItr = org.uimafit.util.JCasUtil.selectCovered(jcas,
> BaseToken.class, covering).iterator();
> >> -     while (btaItr.hasNext())
> >> -             {
> >> -                     BaseToken bta = (BaseToken) btaItr.next();
> >> -                             ltList.add(lt);
> >> -                     }
> >> -             }
> >>
> >> +     ltList = org.uimafit.util.JCasUtil.selectCovered(jcas,
> BaseToken.class, covering);
> >>
> >>        return ltList;
> >>
> >> I know you said it was quick and dirty at the moment - my 2 cents -
> unless someone comes up with a better engineered solution, I think we 
> could add the new method (with a name like getLookupTokens) and leave 
> the old one so we don't have to deprecate anything. And phase in the 
> change to the various *LookupInitializerImpl classes if needed.
> >>
> >> -- James
> >>
> >>
> >>> -----Original Message-----
> >>> From: ctakes-dev-return-1138-
> Masanz.James=mayo.edu@incubator.apache.org
> >>> [mailto:ctakes-dev-return-1138-
> Masanz.James=mayo.edu@incubator.apache.org]
> >>> On Behalf Of Masanz, James J.
> >>> Sent: Monday, February 04, 2013 4:01 PM
> >>> To: ctakes-dev@incubator.apache.org
> >>> Subject: RE: assistance with dictionary lookup issue
> >>>
> >>> I'll take a look at the patch. Also be aware of
> >>> https://issues.apache.org/jira/browse/CTAKES-31 which talks about 
> >>> a
> way of
> >>> enhancing performance  -- if willing to assume annotations 
> >>> (BaseTokens
> >>> currently) are sorted. Currently it's always BaseToken and always
> sorted,
> >>> just not sure if we want to code to that assumption.
> >>>
> >>> ________________________________________
> >>> From: ctakes-dev-return-1137-
> Masanz.James=mayo.edu@incubator.apache.org
> >>> [ctakes-dev-return-1137-Masanz.James=mayo.edu@incubator.apache.org
> >>> ] on behalf of Tim Miller [timothy.miller@childrens.harvard.edu]
> >>> Sent: Monday, February 04, 2013 3:43 PM
> >>> To: ctakes-dev@incubator.apache.org
> >>> Subject: assistance with dictionary lookup issue
> >>>
> >>> Pei helped me track down an issue with performance I'd noticed in 
> >>> the dictionary annotator, and I have filed the issue here:
> >>> https://issues.apache.org/jira/browse/CTAKES-143
> >>>
> >>> I implemented a quick and dirty proof of concept fix and noticed
> dramatic
> >>> performance improvement.  I attached the patch to the issue, but 
> >>> it involves changing an interface (currently does not try to fix 
> >>> other implementing classes so obviously not ready for primetime), 
> >>> so I
> wanted to
> >>> solicit the list first in case anyone with better knowledge of 
> >>> that
> module
> >>> has some better engineering ideas than what I came up with.
> >>>
> >>> Thanks,
> >>>
> >>> --
> >>> Tim Miller, PhD
> >>> Postdoctoral Research Fellow
> >>> Children's Hospital Informatics Program Children's Hospital Boston 
> >>> and Harvard Medical School
> >>> 617-919-1223

RE: assistance with dictionary lookup issue

Posted by "Masanz, James J." <Ma...@mayo.edu>.

Yes, that's what I had in mind, with a future work item to consider doing the same improvement to the other LookupInitializer(s) some day.

-- James

> -----Original Message-----
> From: ctakes-dev-return-1148-Masanz.James=mayo.edu@incubator.apache.org
> [mailto:ctakes-dev-return-1148-Masanz.James=mayo.edu@incubator.apache.org]
> On Behalf Of Tim Miller
> Sent: Tuesday, February 05, 2013 10:05 AM
> To: ctakes-dev@incubator.apache.org
> Subject: Re: assistance with dictionary lookup issue
> 
> OK thanks, that clarifies it (including Sean's concern) for me.  I think
> calling it getSortedLookupTokens is the right idea.  Then it makes it
> clear to the implementing class that at some point sorting has to be done,
> and if its not implicit/free in the implementation (as it is if you base
> it on Annotations), then it needs to be explicit.  Since that can be a
> substantial performance penalty, putting the onus on the implementation
> will hopefully lead to better implementations.
> 
> So to summarize the changes to make sure we're on the same page:
> 
>   * the interface will add a method:
> 
>              public List getSortedLookupTokens(JCas, Annotation);
> 
>   * getLookupTokenIterator() will be reverted to its old version.
> 
>   * FirstTokenPermLookupInitializerImpl will have its
>     getLookupTokenIterator method reverted, and my changes
> 
> will go in the implementation of getSortedLookupTokens()   modulo the
> unnecessary iterator inside.
> 
>   * DictionaryLookupAnnotator will be changed to call
>     getSortedLookupTokens instead of
>     getLookupTokenIterator/constrainToWindow
>   * Other LookupInitializer's will be implemented to simply call
>     existing getLookupTokenIterator and do post-processing to constrain
>     to the window (?)
> 
> Does this match what you had in mind James?  Any objections or things I'm
> missing anyone?
> 
> 
> On 02/05/2013 10:38 AM, Masanz, James J. wrote:
> > First, about the loop - I had been looking too quickly at the diff and
> didn't notice the logic about punctuation etc
> >
> > Second, what I remember when I looked at it before, was seeing interface
> named LookupInitializer, which being old enough, doesn't have Iterator
> parameterized in the definition of the getLookupTokenIterator:
> >      public Iterator getLookupTokenIterator(JCas jcas)
> >              throws AnnotatorInitializationException;
> >
> > and that ends up being effectively an Iterator<LookupToken> and
> LookupToken does not inherit from Annotation, and I stopped at that point.
> >
> > But now looking farther, it looks to me that that's fine because in
> FirstTokenPermLookupInitializerImpl, we look through the BaseTokens and
> create the list of LookupTokens based on the (sorted) BaseTokens, so the
> LookupTokens will be sorted too.
> >
> > so maybe we should call the new method getSortedLookupTokens to make it
> clear they too are sorted
> >
> > ________________________________________
> > From: ctakes-dev-return-1146-Masanz.James=mayo.edu@incubator.apache.org
> [ctakes-dev-return-1146-Masanz.James=mayo.edu@incubator.apache.org] on
> behalf of Tim Miller [timothy.miller@childrens.harvard.edu]
> > Sent: Tuesday, February 05, 2013 9:10 AM
> > To: ctakes-dev@incubator.apache.org
> > Subject: Re: assistance with dictionary lookup issue
> >
> > Yeah, if you mean just change the loop to iterate over the list instead
> > of getting an iterator that makes sense.  There is still some logic in
> > there to leave out punctuation tokens but I think you were implying that
> > to be in your mockup diff.
> >
> > As for sorting, the AnnotationIndex defines a sort order for its
> iterators:
> > http://uima.apache.org/d/uimaj-
> 2.4.0/apidocs/org/apache/uima/cas/text/AnnotationIndex.html
> >
> > so we are safe assuming that anything extending Annotation will be
> > iterated in sorted order.  Does that answer the questions we had? Or was
> > I missing something subtle in that discussion?
> >
> > Tim
> >
> > On 02/05/2013 09:44 AM, Masanz, James J. wrote:
> >> Looks good to me, with one question.
> >>
> >> Instead of getting an iterator and then building a new list, can we
> just skip getting the iterator and use the list that selectCovered
> returns?
> >>
> >> I will mock up a diff here of what I mean:
> >> -     Iterator btaItr = org.uimafit.util.JCasUtil.selectCovered(jcas,
> BaseToken.class, covering).iterator();
> >> -     while (btaItr.hasNext())
> >> -             {
> >> -                     BaseToken bta = (BaseToken) btaItr.next();
> >> -                             ltList.add(lt);
> >> -                     }
> >> -             }
> >>
> >> +     ltList = org.uimafit.util.JCasUtil.selectCovered(jcas,
> BaseToken.class, covering);
> >>
> >>        return ltList;
> >>
> >> I know you said it was quick and dirty at the moment - my 2 cents -
> unless someone comes up with a better engineered solution, I think we
> could add the new method (with a name like getLookupTokens) and leave the
> old one so we don't have to deprecate anything. And phase in the change to
> the various *LookupInitializerImpl classes if needed.
> >>
> >> -- James
> >>
> >>
> >>> -----Original Message-----
> >>> From: ctakes-dev-return-1138-
> Masanz.James=mayo.edu@incubator.apache.org
> >>> [mailto:ctakes-dev-return-1138-
> Masanz.James=mayo.edu@incubator.apache.org]
> >>> On Behalf Of Masanz, James J.
> >>> Sent: Monday, February 04, 2013 4:01 PM
> >>> To: ctakes-dev@incubator.apache.org
> >>> Subject: RE: assistance with dictionary lookup issue
> >>>
> >>> I'll take a look at the patch. Also be aware of
> >>> https://issues.apache.org/jira/browse/CTAKES-31 which talks about a
> way of
> >>> enhancing performance  -- if willing to assume annotations (BaseTokens
> >>> currently) are sorted. Currently it's always BaseToken and always
> sorted,
> >>> just not sure if we want to code to that assumption.
> >>>
> >>> ________________________________________
> >>> From: ctakes-dev-return-1137-
> Masanz.James=mayo.edu@incubator.apache.org
> >>> [ctakes-dev-return-1137-Masanz.James=mayo.edu@incubator.apache.org] on
> >>> behalf of Tim Miller [timothy.miller@childrens.harvard.edu]
> >>> Sent: Monday, February 04, 2013 3:43 PM
> >>> To: ctakes-dev@incubator.apache.org
> >>> Subject: assistance with dictionary lookup issue
> >>>
> >>> Pei helped me track down an issue with performance I'd noticed in the
> >>> dictionary annotator, and I have filed the issue here:
> >>> https://issues.apache.org/jira/browse/CTAKES-143
> >>>
> >>> I implemented a quick and dirty proof of concept fix and noticed
> dramatic
> >>> performance improvement.  I attached the patch to the issue, but it
> >>> involves changing an interface (currently does not try to fix other
> >>> implementing classes so obviously not ready for primetime), so I
> wanted to
> >>> solicit the list first in case anyone with better knowledge of that
> module
> >>> has some better engineering ideas than what I came up with.
> >>>
> >>> Thanks,
> >>>
> >>> --
> >>> Tim Miller, PhD
> >>> Postdoctoral Research Fellow
> >>> Children's Hospital Informatics Program
> >>> Children's Hospital Boston and Harvard Medical School
> >>> 617-919-1223

Re: assistance with dictionary lookup issue

Posted by Tim Miller <ti...@childrens.harvard.edu>.

OK thanks, that clarifies it (including Sean's concern) for me.  I think 
calling it getSortedLookupTokens is the right idea.  Then it makes it 
clear to the implementing class that at some point sorting has to be 
done, and if its not implicit/free in the implementation (as it is if 
you base it on Annotations), then it needs to be explicit.  Since that 
can be a substantial performance penalty, putting the onus on the 
implementation will hopefully lead to better implementations.

So to summarize the changes to make sure we're on the same page:

  * the interface will add a method:

             public List getSortedLookupTokens(JCas, Annotation);

  * getLookupTokenIterator() will be reverted to its old version.

  * FirstTokenPermLookupInitializerImpl will have its
    getLookupTokenIterator method reverted, and my changes

will go in the implementation of getSortedLookupTokens()   modulo the 
unnecessary iterator inside.

  * DictionaryLookupAnnotator will be changed to call
    getSortedLookupTokens instead of
    getLookupTokenIterator/constrainToWindow
  * Other LookupInitializer's will be implemented to simply call
    existing getLookupTokenIterator and do post-processing to constrain
    to the window (?)

Does this match what you had in mind James?  Any objections or things 
I'm missing anyone?


On 02/05/2013 10:38 AM, Masanz, James J. wrote:
> First, about the loop - I had been looking too quickly at the diff and didn't notice the logic about punctuation etc
>
> Second, what I remember when I looked at it before, was seeing interface named LookupInitializer, which being old enough, doesn't have Iterator parameterized in the definition of the getLookupTokenIterator:
>      public Iterator getLookupTokenIterator(JCas jcas)
>              throws AnnotatorInitializationException;
>
> and that ends up being effectively an Iterator<LookupToken> and LookupToken does not inherit from Annotation, and I stopped at that point.
>
> But now looking farther, it looks to me that that's fine because in FirstTokenPermLookupInitializerImpl, we look through the BaseTokens and create the list of LookupTokens based on the (sorted) BaseTokens, so the LookupTokens will be sorted too.
>
> so maybe we should call the new method getSortedLookupTokens to make it clear they too are sorted
>
> ________________________________________
> From: ctakes-dev-return-1146-Masanz.James=mayo.edu@incubator.apache.org [ctakes-dev-return-1146-Masanz.James=mayo.edu@incubator.apache.org] on behalf of Tim Miller [timothy.miller@childrens.harvard.edu]
> Sent: Tuesday, February 05, 2013 9:10 AM
> To: ctakes-dev@incubator.apache.org
> Subject: Re: assistance with dictionary lookup issue
>
> Yeah, if you mean just change the loop to iterate over the list instead
> of getting an iterator that makes sense.  There is still some logic in
> there to leave out punctuation tokens but I think you were implying that
> to be in your mockup diff.
>
> As for sorting, the AnnotationIndex defines a sort order for its iterators:
> http://uima.apache.org/d/uimaj-2.4.0/apidocs/org/apache/uima/cas/text/AnnotationIndex.html
>
> so we are safe assuming that anything extending Annotation will be
> iterated in sorted order.  Does that answer the questions we had? Or was
> I missing something subtle in that discussion?
>
> Tim
>
> On 02/05/2013 09:44 AM, Masanz, James J. wrote:
>> Looks good to me, with one question.
>>
>> Instead of getting an iterator and then building a new list, can we just skip getting the iterator and use the list that selectCovered returns?
>>
>> I will mock up a diff here of what I mean:
>> -     Iterator btaItr = org.uimafit.util.JCasUtil.selectCovered(jcas, BaseToken.class, covering).iterator();
>> -     while (btaItr.hasNext())
>> -             {
>> -                     BaseToken bta = (BaseToken) btaItr.next();
>> -                             ltList.add(lt);
>> -                     }
>> -             }
>>
>> +     ltList = org.uimafit.util.JCasUtil.selectCovered(jcas, BaseToken.class, covering);
>>
>>        return ltList;
>>
>> I know you said it was quick and dirty at the moment - my 2 cents - unless someone comes up with a better engineered solution, I think we could add the new method (with a name like getLookupTokens) and leave the old one so we don't have to deprecate anything. And phase in the change to the various *LookupInitializerImpl classes if needed.
>>
>> -- James
>>
>>
>>> -----Original Message-----
>>> From: ctakes-dev-return-1138-Masanz.James=mayo.edu@incubator.apache.org
>>> [mailto:ctakes-dev-return-1138-Masanz.James=mayo.edu@incubator.apache.org]
>>> On Behalf Of Masanz, James J.
>>> Sent: Monday, February 04, 2013 4:01 PM
>>> To: ctakes-dev@incubator.apache.org
>>> Subject: RE: assistance with dictionary lookup issue
>>>
>>> I'll take a look at the patch. Also be aware of
>>> https://issues.apache.org/jira/browse/CTAKES-31 which talks about a way of
>>> enhancing performance  -- if willing to assume annotations (BaseTokens
>>> currently) are sorted. Currently it's always BaseToken and always sorted,
>>> just not sure if we want to code to that assumption.
>>>
>>> ________________________________________
>>> From: ctakes-dev-return-1137-Masanz.James=mayo.edu@incubator.apache.org
>>> [ctakes-dev-return-1137-Masanz.James=mayo.edu@incubator.apache.org] on
>>> behalf of Tim Miller [timothy.miller@childrens.harvard.edu]
>>> Sent: Monday, February 04, 2013 3:43 PM
>>> To: ctakes-dev@incubator.apache.org
>>> Subject: assistance with dictionary lookup issue
>>>
>>> Pei helped me track down an issue with performance I'd noticed in the
>>> dictionary annotator, and I have filed the issue here:
>>> https://issues.apache.org/jira/browse/CTAKES-143
>>>
>>> I implemented a quick and dirty proof of concept fix and noticed dramatic
>>> performance improvement.  I attached the patch to the issue, but it
>>> involves changing an interface (currently does not try to fix other
>>> implementing classes so obviously not ready for primetime), so I wanted to
>>> solicit the list first in case anyone with better knowledge of that module
>>> has some better engineering ideas than what I came up with.
>>>
>>> Thanks,
>>>
>>> --
>>> Tim Miller, PhD
>>> Postdoctoral Research Fellow
>>> Children's Hospital Informatics Program
>>> Children's Hospital Boston and Harvard Medical School
>>> 617-919-1223

RE: assistance with dictionary lookup issue

Posted by "Masanz, James J." <Ma...@mayo.edu>.

First, about the loop - I had been looking too quickly at the diff and didn't notice the logic about punctuation etc

Second, what I remember when I looked at it before, was seeing interface named LookupInitializer, which being old enough, doesn't have Iterator parameterized in the definition of the getLookupTokenIterator:
    public Iterator getLookupTokenIterator(JCas jcas)
            throws AnnotatorInitializationException;

and that ends up being effectively an Iterator<LookupToken> and LookupToken does not inherit from Annotation, and I stopped at that point. 

But now looking farther, it looks to me that that's fine because in FirstTokenPermLookupInitializerImpl, we look through the BaseTokens and create the list of LookupTokens based on the (sorted) BaseTokens, so the LookupTokens will be sorted too.

so maybe we should call the new method getSortedLookupTokens to make it clear they too are sorted

________________________________________
From: ctakes-dev-return-1146-Masanz.James=mayo.edu@incubator.apache.org [ctakes-dev-return-1146-Masanz.James=mayo.edu@incubator.apache.org] on behalf of Tim Miller [timothy.miller@childrens.harvard.edu]
Sent: Tuesday, February 05, 2013 9:10 AM
To: ctakes-dev@incubator.apache.org
Subject: Re: assistance with dictionary lookup issue

Yeah, if you mean just change the loop to iterate over the list instead
of getting an iterator that makes sense.  There is still some logic in
there to leave out punctuation tokens but I think you were implying that
to be in your mockup diff.

As for sorting, the AnnotationIndex defines a sort order for its iterators:
http://uima.apache.org/d/uimaj-2.4.0/apidocs/org/apache/uima/cas/text/AnnotationIndex.html

so we are safe assuming that anything extending Annotation will be
iterated in sorted order.  Does that answer the questions we had? Or was
I missing something subtle in that discussion?

Tim

On 02/05/2013 09:44 AM, Masanz, James J. wrote:
> Looks good to me, with one question.
>
> Instead of getting an iterator and then building a new list, can we just skip getting the iterator and use the list that selectCovered returns?
>
> I will mock up a diff here of what I mean:
> -     Iterator btaItr = org.uimafit.util.JCasUtil.selectCovered(jcas, BaseToken.class, covering).iterator();
> -     while (btaItr.hasNext())
> -             {
> -                     BaseToken bta = (BaseToken) btaItr.next();
> -                             ltList.add(lt);
> -                     }
> -             }
>
> +     ltList = org.uimafit.util.JCasUtil.selectCovered(jcas, BaseToken.class, covering);
>
>       return ltList;
>
> I know you said it was quick and dirty at the moment - my 2 cents - unless someone comes up with a better engineered solution, I think we could add the new method (with a name like getLookupTokens) and leave the old one so we don't have to deprecate anything. And phase in the change to the various *LookupInitializerImpl classes if needed.
>
> -- James
>
>
>> -----Original Message-----
>> From: ctakes-dev-return-1138-Masanz.James=mayo.edu@incubator.apache.org
>> [mailto:ctakes-dev-return-1138-Masanz.James=mayo.edu@incubator.apache.org]
>> On Behalf Of Masanz, James J.
>> Sent: Monday, February 04, 2013 4:01 PM
>> To: ctakes-dev@incubator.apache.org
>> Subject: RE: assistance with dictionary lookup issue
>>
>> I'll take a look at the patch. Also be aware of
>> https://issues.apache.org/jira/browse/CTAKES-31 which talks about a way of
>> enhancing performance  -- if willing to assume annotations (BaseTokens
>> currently) are sorted. Currently it's always BaseToken and always sorted,
>> just not sure if we want to code to that assumption.
>>
>> ________________________________________
>> From: ctakes-dev-return-1137-Masanz.James=mayo.edu@incubator.apache.org
>> [ctakes-dev-return-1137-Masanz.James=mayo.edu@incubator.apache.org] on
>> behalf of Tim Miller [timothy.miller@childrens.harvard.edu]
>> Sent: Monday, February 04, 2013 3:43 PM
>> To: ctakes-dev@incubator.apache.org
>> Subject: assistance with dictionary lookup issue
>>
>> Pei helped me track down an issue with performance I'd noticed in the
>> dictionary annotator, and I have filed the issue here:
>> https://issues.apache.org/jira/browse/CTAKES-143
>>
>> I implemented a quick and dirty proof of concept fix and noticed dramatic
>> performance improvement.  I attached the patch to the issue, but it
>> involves changing an interface (currently does not try to fix other
>> implementing classes so obviously not ready for primetime), so I wanted to
>> solicit the list first in case anyone with better knowledge of that module
>> has some better engineering ideas than what I came up with.
>>
>> Thanks,
>>
>> --
>> Tim Miller, PhD
>> Postdoctoral Research Fellow
>> Children's Hospital Informatics Program
>> Children's Hospital Boston and Harvard Medical School
>> 617-919-1223

Re: assistance with dictionary lookup issue

Posted by Tim Miller <ti...@childrens.harvard.edu>.

Yeah, if you mean just change the loop to iterate over the list instead 
of getting an iterator that makes sense.  There is still some logic in 
there to leave out punctuation tokens but I think you were implying that 
to be in your mockup diff.

As for sorting, the AnnotationIndex defines a sort order for its iterators:
http://uima.apache.org/d/uimaj-2.4.0/apidocs/org/apache/uima/cas/text/AnnotationIndex.html

so we are safe assuming that anything extending Annotation will be 
iterated in sorted order.  Does that answer the questions we had? Or was 
I missing something subtle in that discussion?

Tim

On 02/05/2013 09:44 AM, Masanz, James J. wrote:
> Looks good to me, with one question.
>
> Instead of getting an iterator and then building a new list, can we just skip getting the iterator and use the list that selectCovered returns?
>
> I will mock up a diff here of what I mean:
> -	Iterator btaItr = org.uimafit.util.JCasUtil.selectCovered(jcas, BaseToken.class, covering).iterator();
> -	while (btaItr.hasNext())
> -		{
> - 			BaseToken bta = (BaseToken) btaItr.next();
> -				ltList.add(lt);
> - 			}
> - 		}
>
> +	ltList = org.uimafit.util.JCasUtil.selectCovered(jcas, BaseToken.class, covering);
> 	
> 	return ltList;
>
> I know you said it was quick and dirty at the moment - my 2 cents - unless someone comes up with a better engineered solution, I think we could add the new method (with a name like getLookupTokens) and leave the old one so we don't have to deprecate anything. And phase in the change to the various *LookupInitializerImpl classes if needed.
>
> -- James
>
>
>> -----Original Message-----
>> From: ctakes-dev-return-1138-Masanz.James=mayo.edu@incubator.apache.org
>> [mailto:ctakes-dev-return-1138-Masanz.James=mayo.edu@incubator.apache.org]
>> On Behalf Of Masanz, James J.
>> Sent: Monday, February 04, 2013 4:01 PM
>> To: ctakes-dev@incubator.apache.org
>> Subject: RE: assistance with dictionary lookup issue
>>
>> I'll take a look at the patch. Also be aware of
>> https://issues.apache.org/jira/browse/CTAKES-31 which talks about a way of
>> enhancing performance  -- if willing to assume annotations (BaseTokens
>> currently) are sorted. Currently it's always BaseToken and always sorted,
>> just not sure if we want to code to that assumption.
>>
>> ________________________________________
>> From: ctakes-dev-return-1137-Masanz.James=mayo.edu@incubator.apache.org
>> [ctakes-dev-return-1137-Masanz.James=mayo.edu@incubator.apache.org] on
>> behalf of Tim Miller [timothy.miller@childrens.harvard.edu]
>> Sent: Monday, February 04, 2013 3:43 PM
>> To: ctakes-dev@incubator.apache.org
>> Subject: assistance with dictionary lookup issue
>>
>> Pei helped me track down an issue with performance I'd noticed in the
>> dictionary annotator, and I have filed the issue here:
>> https://issues.apache.org/jira/browse/CTAKES-143
>>
>> I implemented a quick and dirty proof of concept fix and noticed dramatic
>> performance improvement.  I attached the patch to the issue, but it
>> involves changing an interface (currently does not try to fix other
>> implementing classes so obviously not ready for primetime), so I wanted to
>> solicit the list first in case anyone with better knowledge of that module
>> has some better engineering ideas than what I came up with.
>>
>> Thanks,
>>
>> --
>> Tim Miller, PhD
>> Postdoctoral Research Fellow
>> Children's Hospital Informatics Program
>> Children's Hospital Boston and Harvard Medical School
>> 617-919-1223

RE: assistance with dictionary lookup issue

Posted by "Masanz, James J." <Ma...@mayo.edu>.

Looks good to me, with one question.

Instead of getting an iterator and then building a new list, can we just skip getting the iterator and use the list that selectCovered returns?

I will mock up a diff here of what I mean:
-	Iterator btaItr = org.uimafit.util.JCasUtil.selectCovered(jcas, BaseToken.class, covering).iterator();
-	while (btaItr.hasNext())
-		{
- 			BaseToken bta = (BaseToken) btaItr.next();
-				ltList.add(lt);
- 			}
- 		}

+	ltList = org.uimafit.util.JCasUtil.selectCovered(jcas, BaseToken.class, covering);
	
	return ltList;

I know you said it was quick and dirty at the moment - my 2 cents - unless someone comes up with a better engineered solution, I think we could add the new method (with a name like getLookupTokens) and leave the old one so we don't have to deprecate anything. And phase in the change to the various *LookupInitializerImpl classes if needed.

-- James


> -----Original Message-----
> From: ctakes-dev-return-1138-Masanz.James=mayo.edu@incubator.apache.org
> [mailto:ctakes-dev-return-1138-Masanz.James=mayo.edu@incubator.apache.org]
> On Behalf Of Masanz, James J.
> Sent: Monday, February 04, 2013 4:01 PM
> To: ctakes-dev@incubator.apache.org
> Subject: RE: assistance with dictionary lookup issue
> 
> I'll take a look at the patch. Also be aware of
> https://issues.apache.org/jira/browse/CTAKES-31 which talks about a way of
> enhancing performance  -- if willing to assume annotations (BaseTokens
> currently) are sorted. Currently it's always BaseToken and always sorted,
> just not sure if we want to code to that assumption.
> 
> ________________________________________
> From: ctakes-dev-return-1137-Masanz.James=mayo.edu@incubator.apache.org
> [ctakes-dev-return-1137-Masanz.James=mayo.edu@incubator.apache.org] on
> behalf of Tim Miller [timothy.miller@childrens.harvard.edu]
> Sent: Monday, February 04, 2013 3:43 PM
> To: ctakes-dev@incubator.apache.org
> Subject: assistance with dictionary lookup issue
> 
> Pei helped me track down an issue with performance I'd noticed in the
> dictionary annotator, and I have filed the issue here:
> https://issues.apache.org/jira/browse/CTAKES-143
> 
> I implemented a quick and dirty proof of concept fix and noticed dramatic
> performance improvement.  I attached the patch to the issue, but it
> involves changing an interface (currently does not try to fix other
> implementing classes so obviously not ready for primetime), so I wanted to
> solicit the list first in case anyone with better knowledge of that module
> has some better engineering ideas than what I came up with.
> 
> Thanks,
> 
> --
> Tim Miller, PhD
> Postdoctoral Research Fellow
> Children's Hospital Informatics Program
> Children's Hospital Boston and Harvard Medical School
> 617-919-1223

RE: assistance with dictionary lookup issue

Posted by "Masanz, James J." <Ma...@mayo.edu>.

I'll take a look at the patch. Also be aware of https://issues.apache.org/jira/browse/CTAKES-31 which talks about a way of enhancing performance  -- if willing to assume annotations (BaseTokens currently) are sorted. Currently it's always BaseToken and always sorted, just not sure if we want to code to that assumption.

________________________________________
From: ctakes-dev-return-1137-Masanz.James=mayo.edu@incubator.apache.org [ctakes-dev-return-1137-Masanz.James=mayo.edu@incubator.apache.org] on behalf of Tim Miller [timothy.miller@childrens.harvard.edu]
Sent: Monday, February 04, 2013 3:43 PM
To: ctakes-dev@incubator.apache.org
Subject: assistance with dictionary lookup issue

Pei helped me track down an issue with performance I'd noticed in the
dictionary annotator, and I have filed the issue here:
https://issues.apache.org/jira/browse/CTAKES-143

I implemented a quick and dirty proof of concept fix and noticed
dramatic performance improvement.  I attached the patch to the issue,
but it involves changing an interface (currently does not try to fix
other implementing classes so obviously not ready for primetime), so I
wanted to solicit the list first in case anyone with better knowledge of
that module has some better engineering ideas than what I came up with.

Thanks,

--
Tim Miller, PhD
Postdoctoral Research Fellow
Children's Hospital Informatics Program
Children's Hospital Boston and Harvard Medical School
617-919-1223