You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by Akram <as...@yahoo.com.INVALID> on 2020/05/24 01:29:12 UTC

using UMLS Metathesaurus in cTAKES offline

I want to use cTAKES offline

I am using command line 

    run\runClinicalPipeline  -i E:\cTAKES\files\MedReps\Input  --xmiOut E:\cTAKES\files\MedReps\Output  --user myuser  --pass mypassword

This is my piper file 

load DefaultTokenizerPipeline.piper

add DefaultJCasTermAnnotator

load AttributeCleartkSubPipe.piper

writeXmis


I thought UMLS will be accessed once and download all needed files so the next time it does not need the internet to access **UMLS**

but I was wrong.

When I work offline cTAKES does not work in attempt to access UMLS and gives error.

I found that UMLS offers to download its data, so I did

I downloaded **umls-2020AA-full.zip**

I extracted Metathesaurus using MetamorphoSys and added it to 

    E:\cTAKES\resources\org\apache\ctakes\dictionary\lookup\umls2020aa

It is a huge folder 30GB+ full of .RRF files but did not work

Not sure where the problem is

do I have to change pipers?

do I have to change the command?

do I have to change files in the folder umls2020aa?


How to fix these problems to use cTAKES offline?


Re: using UMLS Metathesaurus in cTAKES offline [EXTERNAL]

Posted by Peter Abramowitsch <pa...@gmail.com>.
Hi Akram

It's not a matter of instructions.  It's not been made easy because
obviously we won't want people to do it.    It's a matter of writing code
that serves as an intermediary between ctakes and the UMLS authentication
service or a simulated version thereof.  As an example, I have a use case
where the instance of ctakes is buried deep in a PHI protected environment
where we have forbidden web services to connect direct;y with the outside
world.  So I've written a proxy mechanism that stands half in the protected
perimeter and half out.    But we're still authenticating.

Perhaps you can figure out a strategy for authenticating and creating a
time-limited token that can be taken offline.   The hook to do something
like that is in the System property -Dctakes.umlsaddr

Peter

Peter



On Mon, May 25, 2020 at 10:56 AM Akram <as...@yahoo.com.invalid> wrote:

>  Thanks Sean & Peter
> I totally appreciate what NLM and cTAKES are doing.
> you said "It is possible to use ctakes and the UMLS dictionary completely
> offline"
> How can I use them offline?Is there any document that shows how?If not,
> then could you please help me with instructions.
> Best Regards
>
>     On Monday, 25 May 2020, 11:26:04 pm AEST, Finan, Sean <
> sean.finan@childrens.harvard.edu> wrote:
>
>  Peter is absolutely correct.
>
> It is possible to use ctakes and the UMLS dictionary completely offline,
> but it isn't recommended for regular use.  If you have any way to connect
> to the internet please use the standard methods.
>
> Many years ago the initial creators of ctakes negotiated with the NLM to
> enable the unique manner in which ctakes uses the UMLS.  At the time it had
> never been done and the UMLS could not be redistributed by outside
> agencies.  A legal (and amicable) partnership between the NLM and ctakes is
> absolutely necessary, and upholding our side of the agreement is how we
> make that happen.
>
> NLM maintains the umls and without proof of importance this maintenance
> would cease.
>
> NLM grants are one mechanism by which ctakes development gets funding, so
> it really is important that they know who is using the UMLS and how
> frequently.  There is no detriment to providing them this information.  You
> will never be charged for use, no matter how heavy it may be.
>
> The NLM sends annual requests for users to complete a survey.  It is
> extremely important that you complete the survey and indicate that you use
> the UMLS for NLP and ctakes.  The NLM and other agencies fund projects in
> part upon user-base.  The larger the user base of ctakes, the greater the
> chance of funding for its development.  Any funding in development turns
> into more accurate annotation engines, more capabilities and simpler usage
> for everybody.
>
> Of course, private funding would also help ...
>
> Sean
>
>
> ________________________________________
> From: Peter Abramowitsch <pa...@gmail.com>
> Sent: Sunday, May 24, 2020 2:49 AM
> To: dev@ctakes.apache.org; Akram
> Subject: Re: using UMLS Metathesaurus in cTAKES offline [EXTERNAL]
>
> * External Email - Caution *
>
>
> Having the data is not synonymous with umls authentication. Out of the box,
> you do need internet connectivity for the authentication to take place.
>  It will happen once during startup. That will be sufficient for as long as
> the instance is running.
>
> The authentication is really meant to be umls' way to measure usage as much
> as it is a permission scheme.
>
> There is a mechanism for the authentication to be proxied through a
> different url which can be built onto to create something like you're
> thinking of. I've used that, but for a different purpose.
>
> But in these days of ever diminishing government, it's valuable for the nlm
> to have those authentication hits.
>
> Peter
>
> On Sat, May 23, 2020, 6:30 PM Akram <as...@yahoo.com.invalid> wrote:
>
> > I want to use cTAKES offline
> >
> > I am using command line
> >
> >    run\runClinicalPipeline  -i E:\cTAKES\files\MedReps\Input  --xmiOut
> > E:\cTAKES\files\MedReps\Output  --user myuser  --pass mypassword
> >
> > This is my piper file
> >
> > load DefaultTokenizerPipeline.piper
> >
> > add DefaultJCasTermAnnotator
> >
> > load AttributeCleartkSubPipe.piper
> >
> > writeXmis
> >
> >
> > I thought UMLS will be accessed once and download all needed files so the
> > next time it does not need the internet to access **UMLS**
> >
> > but I was wrong.
> >
> > When I work offline cTAKES does not work in attempt to access UMLS and
> > gives error.
> >
> > I found that UMLS offers to download its data, so I did
> >
> > I downloaded **umls-2020AA-full.zip**
> >
> > I extracted Metathesaurus using MetamorphoSys and added it to
> >
> >    E:\cTAKES\resources\org\apache\ctakes\dictionary\lookup\umls2020aa
> >
> > It is a huge folder 30GB+ full of .RRF files but did not work
> >
> > Not sure where the problem is
> >
> > do I have to change pipers?
> >
> > do I have to change the command?
> >
> > do I have to change files in the folder umls2020aa?
> >
> >
> > How to fix these problems to use cTAKES offline?
> >
> >
>

Re: using UMLS Metathesaurus in cTAKES offline [EXTERNAL]

Posted by Akram <as...@yahoo.com.INVALID>.
 Thanks Sean & Peter
I totally appreciate what NLM and cTAKES are doing.
you said "It is possible to use ctakes and the UMLS dictionary completely offline"
How can I use them offline?Is there any document that shows how?If not, then could you please help me with instructions.
Best Regards

    On Monday, 25 May 2020, 11:26:04 pm AEST, Finan, Sean <se...@childrens.harvard.edu> wrote:  
 
 Peter is absolutely correct.

It is possible to use ctakes and the UMLS dictionary completely offline, but it isn't recommended for regular use.  If you have any way to connect to the internet please use the standard methods.

Many years ago the initial creators of ctakes negotiated with the NLM to enable the unique manner in which ctakes uses the UMLS.  At the time it had never been done and the UMLS could not be redistributed by outside agencies.  A legal (and amicable) partnership between the NLM and ctakes is absolutely necessary, and upholding our side of the agreement is how we make that happen.

NLM maintains the umls and without proof of importance this maintenance would cease.  

NLM grants are one mechanism by which ctakes development gets funding, so it really is important that they know who is using the UMLS and how frequently.  There is no detriment to providing them this information.  You will never be charged for use, no matter how heavy it may be.

The NLM sends annual requests for users to complete a survey.  It is extremely important that you complete the survey and indicate that you use the UMLS for NLP and ctakes.  The NLM and other agencies fund projects in part upon user-base.  The larger the user base of ctakes, the greater the chance of funding for its development.  Any funding in development turns into more accurate annotation engines, more capabilities and simpler usage for everybody.

Of course, private funding would also help ...

Sean


________________________________________
From: Peter Abramowitsch <pa...@gmail.com>
Sent: Sunday, May 24, 2020 2:49 AM
To: dev@ctakes.apache.org; Akram
Subject: Re: using UMLS Metathesaurus in cTAKES offline [EXTERNAL]

* External Email - Caution *


Having the data is not synonymous with umls authentication. Out of the box,
you do need internet connectivity for the authentication to take place.
 It will happen once during startup. That will be sufficient for as long as
the instance is running.

The authentication is really meant to be umls' way to measure usage as much
as it is a permission scheme.

There is a mechanism for the authentication to be proxied through a
different url which can be built onto to create something like you're
thinking of. I've used that, but for a different purpose.

But in these days of ever diminishing government, it's valuable for the nlm
to have those authentication hits.

Peter

On Sat, May 23, 2020, 6:30 PM Akram <as...@yahoo.com.invalid> wrote:

> I want to use cTAKES offline
>
> I am using command line
>
>    run\runClinicalPipeline  -i E:\cTAKES\files\MedReps\Input  --xmiOut
> E:\cTAKES\files\MedReps\Output  --user myuser  --pass mypassword
>
> This is my piper file
>
> load DefaultTokenizerPipeline.piper
>
> add DefaultJCasTermAnnotator
>
> load AttributeCleartkSubPipe.piper
>
> writeXmis
>
>
> I thought UMLS will be accessed once and download all needed files so the
> next time it does not need the internet to access **UMLS**
>
> but I was wrong.
>
> When I work offline cTAKES does not work in attempt to access UMLS and
> gives error.
>
> I found that UMLS offers to download its data, so I did
>
> I downloaded **umls-2020AA-full.zip**
>
> I extracted Metathesaurus using MetamorphoSys and added it to
>
>    E:\cTAKES\resources\org\apache\ctakes\dictionary\lookup\umls2020aa
>
> It is a huge folder 30GB+ full of .RRF files but did not work
>
> Not sure where the problem is
>
> do I have to change pipers?
>
> do I have to change the command?
>
> do I have to change files in the folder umls2020aa?
>
>
> How to fix these problems to use cTAKES offline?
>
>
  

Re: using UMLS Metathesaurus in cTAKES offline [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Peter is absolutely correct.

It is possible to use ctakes and the UMLS dictionary completely offline, but it isn't recommended for regular use.  If you have any way to connect to the internet please use the standard methods.

Many years ago the initial creators of ctakes negotiated with the NLM to enable the unique manner in which ctakes uses the UMLS.  At the time it had never been done and the UMLS could not be redistributed by outside agencies.  A legal (and amicable) partnership between the NLM and ctakes is absolutely necessary, and upholding our side of the agreement is how we make that happen.

NLM maintains the umls and without proof of importance this maintenance would cease.  

NLM grants are one mechanism by which ctakes development gets funding, so it really is important that they know who is using the UMLS and how frequently.  There is no detriment to providing them this information.  You will never be charged for use, no matter how heavy it may be.

The NLM sends annual requests for users to complete a survey.  It is extremely important that you complete the survey and indicate that you use the UMLS for NLP and ctakes.  The NLM and other agencies fund projects in part upon user-base.  The larger the user base of ctakes, the greater the chance of funding for its development.  Any funding in development turns into more accurate annotation engines, more capabilities and simpler usage for everybody.

Of course, private funding would also help ...

Sean


________________________________________
From: Peter Abramowitsch <pa...@gmail.com>
Sent: Sunday, May 24, 2020 2:49 AM
To: dev@ctakes.apache.org; Akram
Subject: Re: using UMLS Metathesaurus in cTAKES offline [EXTERNAL]

* External Email - Caution *


Having the data is not synonymous with umls authentication. Out of the box,
you do need internet connectivity for the authentication to take place.
 It will happen once during startup. That will be sufficient for as long as
the instance is running.

The authentication is really meant to be umls' way to measure usage as much
as it is a permission scheme.

There is a mechanism for the authentication to be proxied through a
different url which can be built onto to create something like you're
thinking of. I've used that, but for a different purpose.

But in these days of ever diminishing government, it's valuable for the nlm
to have those authentication hits.

Peter

On Sat, May 23, 2020, 6:30 PM Akram <as...@yahoo.com.invalid> wrote:

> I want to use cTAKES offline
>
> I am using command line
>
>     run\runClinicalPipeline  -i E:\cTAKES\files\MedReps\Input  --xmiOut
> E:\cTAKES\files\MedReps\Output  --user myuser  --pass mypassword
>
> This is my piper file
>
> load DefaultTokenizerPipeline.piper
>
> add DefaultJCasTermAnnotator
>
> load AttributeCleartkSubPipe.piper
>
> writeXmis
>
>
> I thought UMLS will be accessed once and download all needed files so the
> next time it does not need the internet to access **UMLS**
>
> but I was wrong.
>
> When I work offline cTAKES does not work in attempt to access UMLS and
> gives error.
>
> I found that UMLS offers to download its data, so I did
>
> I downloaded **umls-2020AA-full.zip**
>
> I extracted Metathesaurus using MetamorphoSys and added it to
>
>     E:\cTAKES\resources\org\apache\ctakes\dictionary\lookup\umls2020aa
>
> It is a huge folder 30GB+ full of .RRF files but did not work
>
> Not sure where the problem is
>
> do I have to change pipers?
>
> do I have to change the command?
>
> do I have to change files in the folder umls2020aa?
>
>
> How to fix these problems to use cTAKES offline?
>
>

Re: using UMLS Metathesaurus in cTAKES offline

Posted by Peter Abramowitsch <pa...@gmail.com>.
Having the data is not synonymous with umls authentication. Out of the box,
you do need internet connectivity for the authentication to take place.
 It will happen once during startup. That will be sufficient for as long as
the instance is running.

The authentication is really meant to be umls' way to measure usage as much
as it is a permission scheme.

There is a mechanism for the authentication to be proxied through a
different url which can be built onto to create something like you're
thinking of. I've used that, but for a different purpose.

But in these days of ever diminishing government, it's valuable for the nlm
to have those authentication hits.

Peter

On Sat, May 23, 2020, 6:30 PM Akram <as...@yahoo.com.invalid> wrote:

> I want to use cTAKES offline
>
> I am using command line
>
>     run\runClinicalPipeline  -i E:\cTAKES\files\MedReps\Input  --xmiOut
> E:\cTAKES\files\MedReps\Output  --user myuser  --pass mypassword
>
> This is my piper file
>
> load DefaultTokenizerPipeline.piper
>
> add DefaultJCasTermAnnotator
>
> load AttributeCleartkSubPipe.piper
>
> writeXmis
>
>
> I thought UMLS will be accessed once and download all needed files so the
> next time it does not need the internet to access **UMLS**
>
> but I was wrong.
>
> When I work offline cTAKES does not work in attempt to access UMLS and
> gives error.
>
> I found that UMLS offers to download its data, so I did
>
> I downloaded **umls-2020AA-full.zip**
>
> I extracted Metathesaurus using MetamorphoSys and added it to
>
>     E:\cTAKES\resources\org\apache\ctakes\dictionary\lookup\umls2020aa
>
> It is a huge folder 30GB+ full of .RRF files but did not work
>
> Not sure where the problem is
>
> do I have to change pipers?
>
> do I have to change the command?
>
> do I have to change files in the folder umls2020aa?
>
>
> How to fix these problems to use cTAKES offline?
>
>