You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by Matthew Vita <ma...@gmail.com> on 2019/07/07 17:05:16 UTC

Re: MySQL web rest version unstable (with question) and note about official web rest Dockerfile

Hi Gandhi, Tim, and Sean,

Just a couple of updates on some cTAKES Rest Service
<https://github.com/GoTeamEpsilon/ctakes-rest-service> (MySQL) work I’ve
been doing + a major issue that popped up.


1) I’ve completely revamped the README to be more streamlined and accurate.
The important changes to the doc include overall structure and language
improvements,  Tomcat’s setup instructions to now point at the new release
file (old one went away), very precise instructions on targeting Java 8
instead of 11, and various system notes.

As for the codebase, I did some small cleanups, removed a security
vulnerability, and incorporated cTAKES SVN Revision 1850060
<https://svn.apache.org/viewvc?view=revision&revision=1850060>.



2) This thread thus far has been focused on some accuracy issues between
the cTAKES REST Service (MySQL) and regular cTAKES (recall I mentioned ‘severe
bipolar i disorder’ doesn’t result correctly in the REST service but does
in regular cTAKES - complete QA documentation here
<https://github.com/MatthewVita/cTAKES-Special-Case-QA>). However, that was
the “tip of the iceberg” as they say - only simple inputs such as “patient
has hypertension” or “diabetes” result correctly. Nothing, for example, in
the long string “Dr. Nutritious Medical Nutrition Therapy for
Hyperlipidemia...and so on...” as found in the cTAKES 4.0 User Install Guide
<https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+4.0+User+Install+Guide>
is resulting.


Anyways, I am looking into/working on this issue (detailed here
<https://github.com/GoTeamEpsilon/ctakes-rest-service/issues/65>). Also,
I’d like to point out that the issue appeared before I made my minor code
changes. I have rigorously tested fresh installs on Ubuntu VMs to come to
the conclusion that it’s a major issue. I would appreciate any
feedback/thoughts into why cTAKES, under the hood or elsewhere, may be
doing this. I’m most concerned with larger inputs resulting in no findings.

-m


P.S.: When the repository is stable again, I’d like to see if renaming
“cTAKES Rest Service” to “cTAKES MySQL Rest Service” would make sense.
Also, perhaps it would make sense to link to it from the official
ctakes-web-rest’s README for those looking to use MySQL (wouldn’t break
licensing rules, as far as I know).




On Thu, Jun 13, 2019 at 9:34 PM Matthew Vita <ma...@gmail.com>
wrote:

> Hi Gandhi,
>
> Sorry if I wasn't clear or my README.md didn't completely make sense.
>
> I tested with both full and default pipelines. The results can be found
> here:
> https://github.com/MatthewVita/cTAKES-Special-Case-QA/tree/master/REST%20MySQL%20Results
>
> Thanks,
> Matthew Vita
>
>
>
> On Thu, Jun 13, 2019 at 1:04 PM gandhi rajan <ga...@gmail.com>
> wrote:
>
>> Hi Matt,
>>
>> As I mentioned earlier, did you tried using full pipeline instead of
>> default pipeline? Ctakes web rest relies on the output XML of ctakes
>> engine.
>>
>> Only way to figure out the problem is to understand what are the pipers
>> used in case of both the cases.
>>
>> On Thursday, June 13, 2019, Matthew Vita <ma...@gmail.com> wrote:
>>
>> > Hi Gandhi,
>> >
>> > Again, sorry for the late reply!
>> >
>> > In terms of presenting and comparing outputs with the same phrase and
>> > multiple pipelines, I have put together a comprehensive (throw-away)
>> > repository that shows and explains the results between the proper cTAKES
>> > system and the MySQL REST one. The phrase was:  severe bipolar i
>> disorder .
>> >
>> > I tried to keep the README limited, but informative along with useful
>> > commentary. Each result is neatly packaged under its system and
>> pipeline:
>> > https://github.com/MatthewVita/cTAKES-Special-Case-QA
>> >
>> > Please let me know if there are other debugging approaches to try with
>> this
>> > issue. I'm not quite sure of how to move forward :).
>> >
>> >
>> > Thanks,
>> > Matthew Vita
>> >
>> >
>> >
>> > On Wed, Jun 5, 2019 at 11:23 PM Matthew Vita <ma...@gmail.com>
>> > wrote:
>> >
>> > > Sorry for the delay (work is busy) - will report back soon with the
>> > > pipelines.
>> > >
>> > > Thanks,
>> > > Matthew Vita
>> > >
>> > >
>> > >
>> > > On Mon, Jun 3, 2019 at 8:19 AM gandhi rajan <ga...@gmail.com>
>> > > wrote:
>> > >
>> > >> Hi Matt, we gotta see what are the types of pipelines used in both
>> > cases.
>> > >> Did you tried using pipeline=full instead pipeline=default?
>> > >>
>> > >> Full pipeline can give more information I guess.
>> > >>
>> > >> On Monday, June 3, 2019, Matthew Vita <ma...@gmail.com>
>> wrote:
>> > >>
>> > >> > Hi Gandhi and All,
>> > >> >
>> > >> > (Correction: my previous statement about the MySQL web rest version
>> > “not
>> > >> > working in its current state” is only partially true. I was able to
>> > HTTP
>> > >> > POST “Hypertension” and get correct results. However, I’ll be
>> showing
>> > >> that
>> > >> > it’s not working for all cases below.)
>> > >> >
>> > >> > My testing/debugging as of today was to set up the following
>> > >> environments
>> > >> > and compare the XMLs:
>> > >> >
>> > >> >
>> > >> >    1.
>> > >> >
>> > >> >    Environment #1 - cTAKES Web Rest MySQL version @ 1850060 (with
>> > output
>> > >> >    xml on, per Gandhi) with the resource data loaded in via plain
>> SQL
>> > >> >    2.
>> > >> >
>> > >> >    Environment #2 - cTAKES proper @ 1850060 with the resources data
>> > >> loaded
>> > >> >    on disk
>> > >> >
>> > >> >
>> > >> > This setup allows for the data to be the same in either MySQL or
>> > HSQLDB.
>> > >> >
>> > >> >
>> > >> >
>> > >> >
>> > >> > Furthermore, I made sure that the MySQL database had these
>> following
>> > >> > entries because I chose to use ‘severe bipolar i disorder’ as my
>> test
>> > >> > string:
>> > >> >
>> > >> >    -
>> > >> >
>> > >> >    cui_terms(236784,12,13,'severe bipolar i disorder , most recent
>> > >> episode
>> > >> >    mixed , with psychotic features','features')
>> > >> >    -
>> > >> >
>> > >> >    TUI(236784,48)
>> > >> >    -
>> > >> >
>> > >> >    PREFTERM(236784,'Severe mixed bipolar I disorder with psychotic
>> > >> >    features')
>> > >> >    -
>> > >> >
>> > >> >    SNOMEDCT_US(236784,10981006)
>> > >> >
>> > >> >
>> > >> >
>> > >> >
>> > >> > Here’s the result of using the regular cTAKES setup with CVD and
>> > >> > AggregatePlaintextFastUMLSProcessor:
>> > >> >
>> > >> > severe_bipolar_i_disorder_cvd.xml -
>> > >> >
>> https://gist.github.com/MatthewVita/93000a05a5d0f4ef6a4267359c63b510
>> > >> >
>> > >> > Here’s the result of using cTAKES Web Rest MySQL with cURL:
>> > >> >
>> > >> > curl -X POST \
>> > >> >
>> > >> >  '
>> > >>
>> http://localhost:8080/ctakes-web-rest/service/analyze?pipeline=Default'
>> > >> > \
>> > >> >
>> > >> >  -H 'cache-control: no-cache' \
>> > >> >
>> > >> >  -d 'severe bipolar i disorder'
>> > >> >
>> > >> > severe_bipolar_i_disorder_rest.xml
>> > >> >
>> https://gist.github.com/MatthewVita/341f8c9a3552f3db9352917b810a20b0
>> > >> >
>> > >> >
>> > >> > The results show that the CVD results are much better. Rest doesn’t
>> > even
>> > >> > pick up on the main disorder. *Any thoughts or more debugging ideas
>> > are
>> > >> > welcomed!*
>> > >> >
>> > >> >
>> > >> >
>> > >> > *Sort of unrelated:* I have a good amount of work getting the MySQL
>> > >> > version’s README instructions cleaned up and removing some other
>> bugs
>> > in
>> > >> > the issue tracker. I wonder if it would be Apache license compliant
>> > for
>> > >> the
>> > >> > main SVN web rest to link to this one? Perhaps this repo can be
>> > changed
>> > >> to
>> > >> > “GoTeamEpsilon/ctakes-mysql-rest-service”?
>> > >> >
>> > >> >
>> > >> > Thanks,
>> > >> > Matthew Vita
>> > >> >
>> > >> >
>> > >> >
>> > >> > On Fri, May 31, 2019 at 10:11 PM gandhi rajan <
>> > gandhirajan.n@gmail.com>
>> > >> > wrote:
>> > >> >
>> > >> > > Hi Matt, I would check whether the XML output from cTAKES
>> contains
>> > the
>> > >> > > terms to isolate the issue.
>> > >> > >
>> > >> > > On Saturday, June 1, 2019, Matthew Vita <matthewvita48@gmail.com
>> >
>> > >> wrote:
>> > >> > >
>> > >> > > > Hi Jeff,
>> > >> > > >
>> > >> > > > Not sure I ran into that same issue. Sorry.
>> > >> > > >
>> > >> > > > In terms of MySQL, I suppose it is faster because it's not
>> > in-memory
>> > >> > > based
>> > >> > > > (to be fair, HSQLDB can utilize disks). Another factor is that
>> you
>> > >> can
>> > >> > > load
>> > >> > > > balance multiple servers in a "stateless" way if you had a
>> heavy
>> > >> load
>> > >> > > > environment because the MySQL stands alone.
>> > >> > > >
>> > >> > > >
>> > >> > > >
>> > >> > > > Hi Gandhi,
>> > >> > > >
>> > >> > > > I'm using trunk@1850060 with the MySQL-based codebase on
>> Github.
>> > >> > > > Everything
>> > >> > > > builds and it even connects to all of the tables and models,
>> > >> however,
>> > >> > it
>> > >> > > > doesn't pick up terms.
>> > >> > > >
>> > >> > > > Where do you think is a good place to start, with respect to
>> > >> debugging?
>> > >> > > The
>> > >> > > > frustrating part is there's no errors in the catalina logs :).
>> > >> > > >
>> > >> > > > Thanks,
>> > >> > > > Matthew Vita
>> > >> > > >
>> > >> > > >
>> > >> > > >
>> > >> > > > On Thu, May 30, 2019 at 2:52 PM gandhi rajan <
>> > >> gandhirajan.n@gmail.com>
>> > >> > > > wrote:
>> > >> > > >
>> > >> > > > > Hi Matt,
>> > >> > > > >
>> > >> > > > > The ctakes web rest module in cTAKES svn trunk is the latest
>> > >> which I
>> > >> > > > > checked in and later modified by Tim.
>> > >> > > > >
>> > >> > > > > On Thursday, May 30, 2019, Matthew Vita <
>> > matthewvita48@gmail.com>
>> > >> > > wrote:
>> > >> > > > >
>> > >> > > > > > Hi Gandhi, Tim, Sean, and Community,
>> > >> > > > > >
>> > >> > > > > > I’ve been fixing up some of the README instructions for
>> > >> > > > > > https://github.com/GoTeamEpsilon/ctakes-rest-service on my
>> > >> local.
>> > >> > > > > > Unfortunately, it’s not working in its current state. I'm
>> > still
>> > >> > > > debugging
>> > >> > > > > > it - is svn co https://svn.apache.org/repos/
>> > >> > asf/ctakes/trunk@1850060
>> > >> > > > > > ctakes
>> > >> > > > > > still the best version of cTAKES to base web-rest on?
>> > >> > > > > >
>> > >> > > > > > Also, it looks like the ctakes-web-rest Dockerfile in the
>> > >> official
>> > >> > > > > > repository is pointing to a broken Tomcat link:
>> > >> > > > > >
>> > >> > > > > > *“The requested URL
>> > >> > > > > > /pub/software/apache/tomcat/tomcat-9/v9.0.14/bin/apache-
>> > >> > > > tomcat-9.0.14.zip
>> > >> > > > > > was not found on this server.”*
>> > >> > > > > >
>> > >> > > > > > There appear to be updated releases here:
>> > >> > > > > >
>> > >> http://mirror.cc.columbia.edu/pub/software/apache/tomcat/tomcat-9/
>> > >> > -
>> > >> > > > > hope
>> > >> > > > > > that helps.
>> > >> > > > > >
>> > >> > > > > >
>> > >> > > > > > Talk soon,
>> > >> > > > > > Matthew
>> > >> > > > > >
>> > >> > > > >
>> > >> > > > >
>> > >> > > > > --
>> > >> > > > > Regards,
>> > >> > > > > Gandhi
>> > >> > > > >
>> > >> > > > > "The best way to find urself is to lose urself in the
>> service of
>> > >> > others
>> > >> > > > > !!!"
>> > >> > > > >
>> > >> > > >
>> > >> > >
>> > >> > >
>> > >> > > --
>> > >> > > Regards,
>> > >> > > Gandhi
>> > >> > >
>> > >> > > "The best way to find urself is to lose urself in the service of
>> > >> others
>> > >> > > !!!"
>> > >> > >
>> > >> >
>> > >>
>> > >>
>> > >> --
>> > >> Regards,
>> > >> Gandhi
>> > >>
>> > >> "The best way to find urself is to lose urself in the service of
>> others
>> > >> !!!"
>> > >>
>> > >
>> >
>>
>>
>> --
>> Regards,
>> Gandhi
>>
>> "The best way to find urself is to lose urself in the service of others
>> !!!"
>>
>