You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by Rick Coleman <rc...@jilocasin.net> on 2023/02/03 20:14:57 UTC

Crash course in cTakes

Hello everyone,

Can anyone point me to an exhaustive set of documentation regarding cTakes?

The main site feels like it was written by a marketing major, lots of 
flash and catchiness, but little in the way of detailed documentation.  
Even the User Install Guide and the Developer Install guide read like 
what they are, install guides.

For example:
Is cTakes the whole package, or just the front end?

If it's just the front end, what's the back end?

It mentions using my UMLS credentials, can you use a local copy of the 
relevant UMLS data?  If so how?

Are the requirements listed, 1GB drive space, Oracle Java 1.8 the 
minimum or the recommended?  What about RAM or CPU? Is non-Oracle Java 
acceptable?  What about 1.17, the current LTS version?

So, does anyone know where I can find out this information?


Thanks.

rik.


Re: Crash course in cTakes [EXTERNAL]

Posted by Rick Coleman <rc...@jilocasin.net>.
John,

Great news, have you thought about posting your code to the public 
repository (does this project even have a GitHub or other public project 
repo?) so that the rest of the devs can build on, or at least reference 
your work to date?

I for one would love to take a peak inside the kimono as it were.


My other suggestion was to remove the "or higher" verbiage from the 
download page since that is where *users* will be downloading and given 
that text they will be expecting that it works 'out of the box' on later 
versions of java, such as the current LTS version 1.17.


rik.

On 2/6/23 11:47, Petersam, John Contractor wrote:
> Hi Rik,
> I understand your feelings.  Not everyone has the ability to upgrade things.  I was responding because I wanted folks to know it was possible.  My use case is different than what I typically see here.  I've got an enterprise solution that uses cTAKES to process 7 million pages of text daily.  I forked my version years ago because there of inefficiencies in certain parts of the library (e.g. the dependency parser) that made it otherwise impractical to deploy for my level of volume.  When version 4 came out, I performed a manual difference of the code to get the updates.  I expect to do the same for version 5 (if for no other reason, to speed up the build time).
>
> Since then, we've performed regular upgrades.  Most of them were pretty straightforward, which is why I don't think getting it past 1.8 would be a huge issue for most developers.  I know I made a few modifications when updating Spring (I believe from 3.x to 4.x, but it's been a while and I'm currently on 6.0.3), and a couple when updating Lucene.  I also know that the most recent upgrade to Hibernate was a bit painful.  In fact, there were couple of functions in ytex that I simply commented out since they weren't being referenced anyway.  But those are the only "hitches" I've ever had during a migration.
>
> I would strongly suggest anyone doing the upgrade make sure they do a full regression test.  I don't trust standard JUnit tests because of the nature of NLP, so I actually compare annotations on a 3 million page test set to ensure that nothing is broken.  That's probably a larger set than most of you need, but obviously more is better.
>
> Hope this helps,
> John
>
> -----Original Message-----
> From: Rick Coleman<rc...@jilocasin.net>  
> Sent: Monday, February 06, 2023 8:21 AM
> To:dev@ctakes.apache.org
> Subject: Re: Crash course in cTakes [EXTERNAL]
>
> John,
>
> That's good to hear, and as Sean remarked, details would be great.
>
> Unfortunately, I don't think we should be expecting non-dev users to have to update dependencies and make code changes since the download page said it works with 1.8 or higher....
>
>
> rik.
>
> On 2/6/23 07:13, Petersam, John Contractor wrote:
>> Hi Rik,
>> I run mine on Java 19, so it can be done.  But I have also updated dependencies and made code modifications to support it.
>>
>> Thanks,
>> John
>>
>> -----Original Message-----
>> From: Rick Coleman<rc...@jilocasin.net>
>> Sent: Friday, February 03, 2023 5:59 PMTo:dev@ctakes.apache.org
>> Subject: Re: Crash course in cTakes [EXTERNAL]
>>
>> Sean,
>>
>> Thanks for getting back to me in this.  I was afraid that was what the answer was going to be.
>>
>> I appreciate you taking the time to fill in some of the gaps.  If it's so dependent on Java 1.8, someone should probably remove the "or higher"
>> on the download page.
>>
>>
>> I look forward to getting this application up and running.
>>
>> Until then,
>>
>> rik.
>>
>> On 2/3/23 15:57, Finan, Sean wrote:
>>> Hi Rick,
>>>
>>> Thank you for the questions and for reminding us that the documentation is sparse, outdated and not very detailed.  Everybody needs a prod now and then to get things done.
>>>
>>> I hope that we can get a solid README and Wiki going on GitHub, as well as an update to the primary website.  It will take a lot of work and some cooperation by committers and users alike.
>>>
>>> I have tried to address your questions inline below.
>>>
>>> Sean
>>>
>>> ________________________________
>>> From: Rick Coleman<rc...@jilocasin.net>
>>> Sent: Friday, February 3, 2023 3:14 PMTo:dev@ctakes.apache.org   
>>> <de...@ctakes.apache.org>
>>> Subject: Crash course in cTakes [EXTERNAL]
>>>
>>> * External Email - Caution *
>>>
>>>
>>> Hello everyone,
>>>
>>> Can anyone point me to an exhaustive set of documentation regarding cTakes?
>>>
>>>      *   Not really.  The wiki that you found is the most that there is.
>>>      *   Most information is scattered across emails written on the dev and user lists.  You can search them here:https://apache.markmail.org/
>>>
>>> The main site feels like it was written by a marketing major, lots of
>>> flash and catchiness, but little in the way of detailed documentation.
>>> Even the User Install Guide and the Developer Install guide read like
>>> what they are, install guides.
>>>
>>> For example:
>>> Is cTakes the whole package, or just the front end?
>>>
>>>      *   ctakes is a clinical nlp platform (vague enough?).   I would say "whole package", but extendable.
>>>      *   It is built on Apache UIMA and allows users to create pipelines of various nlp and i/o components.
>>>      *   It comes with many components that have been built for clinical nlp.
>>>      *   It is extendable; UIMA components from other sources can be placed in the pipelines.
>>>      *   There are front-ends for some tasks, such as running a pipeline or creating a custom dictionary.
>>>
>>> If it's just the front end, what's the back end?
>>>
>>>      *   I would say that each UIMA component is a bit of back-end, as is the controller that actually runs the pipeline.
>>>      *   As mentioned above, you can extend it with non-ctakes back-end components .
>>>
>>> It mentions using my UMLS credentials, can you use a local copy of
>>> the relevant UMLS data?  If so how?
>>>
>>>      *   If you are compiling and running the source then ctakes will automatically download a default dictionary.
>>>      *   If you are running a packaged binary then you'll need to manually pull down a dictionary.
>>>      *   Previous to ctakes 5 downlaoding, unzipping and copying the dictionary was a manual process.
>>>      *   If you are using v5 then you can run bin/getUmlsDictionary and a simple gui will do it for you.
>>>      *   You can also create your own custom dictionary.
>>>      *   The wiki has a page on the dictionary creator gui.
>>>      *   There are instructions on youtube that start with first steps.
>>>
>>> Are the requirements listed, 1GB drive space, Oracle Java 1.8 the
>>> minimum or the recommended?  What about RAM or CPU? Is non-Oracle
>>> Java acceptable?  What about 1.17, the current LTS version?
>>>
>>>> 1GB disk
>>> == Java 1.8
>>>> 2GB RAM  (>= 4 recommended)
>>>> = 64bit CPU
>>> OpenJDK seems to be fine.
>>>
>>> Every java release past 8 is bad for ctakes.  ctakes has a lot of dependencies, many of which are old and rely on a java 8 feature here and there.  ctakes itself probably requires a java 8 special here and there, but I honestly don't know. Unfortunately, ctakes needs to have a serious update effort - maybe for v6.  Part of the problem is actually its capabilities and versatility - the availability of multiple available components and workflows.  A 'minor' change can require a dozen end-to-end tests in dev and user environments on multiple platforms.  Unit tests do not suffice.
>>>
>>>
>>> So, does anyone know where I can find out this information?
>>>
>>>
>>> Thanks.
>>>
>>> rik.
>>>
>>>

RE: Crash course in cTakes [EXTERNAL]

Posted by "Petersam, John Contractor" <Jo...@ssa.gov.INVALID>.
Hi Rik,
I understand your feelings.  Not everyone has the ability to upgrade things.  I was responding because I wanted folks to know it was possible.  My use case is different than what I typically see here.  I've got an enterprise solution that uses cTAKES to process 7 million pages of text daily.  I forked my version years ago because there of inefficiencies in certain parts of the library (e.g. the dependency parser) that made it otherwise impractical to deploy for my level of volume.  When version 4 came out, I performed a manual difference of the code to get the updates.  I expect to do the same for version 5 (if for no other reason, to speed up the build time).

Since then, we've performed regular upgrades.  Most of them were pretty straightforward, which is why I don't think getting it past 1.8 would be a huge issue for most developers.  I know I made a few modifications when updating Spring (I believe from 3.x to 4.x, but it's been a while and I'm currently on 6.0.3), and a couple when updating Lucene.  I also know that the most recent upgrade to Hibernate was a bit painful.  In fact, there were couple of functions in ytex that I simply commented out since they weren't being referenced anyway.  But those are the only "hitches" I've ever had during a migration.

I would strongly suggest anyone doing the upgrade make sure they do a full regression test.  I don't trust standard JUnit tests because of the nature of NLP, so I actually compare annotations on a 3 million page test set to ensure that nothing is broken.  That's probably a larger set than most of you need, but obviously more is better.

Hope this helps,
John

-----Original Message-----
From: Rick Coleman <rc...@jilocasin.net> 
Sent: Monday, February 06, 2023 8:21 AM
To: dev@ctakes.apache.org
Subject: Re: Crash course in cTakes [EXTERNAL]

John,

That's good to hear, and as Sean remarked, details would be great.

Unfortunately, I don't think we should be expecting non-dev users to have to update dependencies and make code changes since the download page said it works with 1.8 or higher....


rik.

On 2/6/23 07:13, Petersam, John Contractor wrote:
> Hi Rik,
> I run mine on Java 19, so it can be done.  But I have also updated dependencies and made code modifications to support it.
>
> Thanks,
> John
>
> -----Original Message-----
> From: Rick Coleman<rc...@jilocasin.net>
> Sent: Friday, February 03, 2023 5:59 PM To:dev@ctakes.apache.org
> Subject: Re: Crash course in cTakes [EXTERNAL]
>
> Sean,
>
> Thanks for getting back to me in this.  I was afraid that was what the answer was going to be.
>
> I appreciate you taking the time to fill in some of the gaps.  If it's so dependent on Java 1.8, someone should probably remove the "or higher"
> on the download page.
>
>
> I look forward to getting this application up and running.
>
> Until then,
>
> rik.
>
> On 2/3/23 15:57, Finan, Sean wrote:
>> Hi Rick,
>>
>> Thank you for the questions and for reminding us that the documentation is sparse, outdated and not very detailed.  Everybody needs a prod now and then to get things done.
>>
>> I hope that we can get a solid README and Wiki going on GitHub, as well as an update to the primary website.  It will take a lot of work and some cooperation by committers and users alike.
>>
>> I have tried to address your questions inline below.
>>
>> Sean
>>
>> ________________________________
>> From: Rick Coleman<rc...@jilocasin.net>
>> Sent: Friday, February 3, 2023 3:14 PM To:dev@ctakes.apache.org  
>> <de...@ctakes.apache.org>
>> Subject: Crash course in cTakes [EXTERNAL]
>>
>> * External Email - Caution *
>>
>>
>> Hello everyone,
>>
>> Can anyone point me to an exhaustive set of documentation regarding cTakes?
>>
>>     *   Not really.  The wiki that you found is the most that there is.
>>     *   Most information is scattered across emails written on the dev and user lists.  You can search them here:https://apache.markmail.org/
>>
>> The main site feels like it was written by a marketing major, lots of 
>> flash and catchiness, but little in the way of detailed documentation.
>> Even the User Install Guide and the Developer Install guide read like 
>> what they are, install guides.
>>
>> For example:
>> Is cTakes the whole package, or just the front end?
>>
>>     *   ctakes is a clinical nlp platform (vague enough?).   I would say "whole package", but extendable.
>>     *   It is built on Apache UIMA and allows users to create pipelines of various nlp and i/o components.
>>     *   It comes with many components that have been built for clinical nlp.
>>     *   It is extendable; UIMA components from other sources can be placed in the pipelines.
>>     *   There are front-ends for some tasks, such as running a pipeline or creating a custom dictionary.
>>
>> If it's just the front end, what's the back end?
>>
>>     *   I would say that each UIMA component is a bit of back-end, as is the controller that actually runs the pipeline.
>>     *   As mentioned above, you can extend it with non-ctakes back-end components .
>>
>> It mentions using my UMLS credentials, can you use a local copy of 
>> the relevant UMLS data?  If so how?
>>
>>     *   If you are compiling and running the source then ctakes will automatically download a default dictionary.
>>     *   If you are running a packaged binary then you'll need to manually pull down a dictionary.
>>     *   Previous to ctakes 5 downlaoding, unzipping and copying the dictionary was a manual process.
>>     *   If you are using v5 then you can run bin/getUmlsDictionary and a simple gui will do it for you.
>>     *   You can also create your own custom dictionary.
>>     *   The wiki has a page on the dictionary creator gui.
>>     *   There are instructions on youtube that start with first steps.
>>
>> Are the requirements listed, 1GB drive space, Oracle Java 1.8 the 
>> minimum or the recommended?  What about RAM or CPU? Is non-Oracle 
>> Java acceptable?  What about 1.17, the current LTS version?
>>
>>> 1GB disk
>> == Java 1.8
>>> 2GB RAM  (>= 4 recommended)
>>> = 64bit CPU
>> OpenJDK seems to be fine.
>>
>> Every java release past 8 is bad for ctakes.  ctakes has a lot of dependencies, many of which are old and rely on a java 8 feature here and there.  ctakes itself probably requires a java 8 special here and there, but I honestly don't know. Unfortunately, ctakes needs to have a serious update effort - maybe for v6.  Part of the problem is actually its capabilities and versatility - the availability of multiple available components and workflows.  A 'minor' change can require a dozen end-to-end tests in dev and user environments on multiple platforms.  Unit tests do not suffice.
>>
>>
>> So, does anyone know where I can find out this information?
>>
>>
>> Thanks.
>>
>> rik.
>>
>>

Re: Crash course in cTakes [EXTERNAL]

Posted by Rick Coleman <rc...@jilocasin.net>.
John,

That's good to hear, and as Sean remarked, details would be great.

Unfortunately, I don't think we should be expecting non-dev users to 
have to update dependencies and make code changes since the download 
page said it works with 1.8 or higher....


rik.

On 2/6/23 07:13, Petersam, John Contractor wrote:
> Hi Rik,
> I run mine on Java 19, so it can be done.  But I have also updated dependencies and made code modifications to support it.
>
> Thanks,
> John
>
> -----Original Message-----
> From: Rick Coleman<rc...@jilocasin.net>  
> Sent: Friday, February 03, 2023 5:59 PM
> To:dev@ctakes.apache.org
> Subject: Re: Crash course in cTakes [EXTERNAL]
>
> Sean,
>
> Thanks for getting back to me in this.  I was afraid that was what the answer was going to be.
>
> I appreciate you taking the time to fill in some of the gaps.  If it's so dependent on Java 1.8, someone should probably remove the "or higher"
> on the download page.
>
>
> I look forward to getting this application up and running.
>
> Until then,
>
> rik.
>
> On 2/3/23 15:57, Finan, Sean wrote:
>> Hi Rick,
>>
>> Thank you for the questions and for reminding us that the documentation is sparse, outdated and not very detailed.  Everybody needs a prod now and then to get things done.
>>
>> I hope that we can get a solid README and Wiki going on GitHub, as well as an update to the primary website.  It will take a lot of work and some cooperation by committers and users alike.
>>
>> I have tried to address your questions inline below.
>>
>> Sean
>>
>> ________________________________
>> From: Rick Coleman<rc...@jilocasin.net>
>> Sent: Friday, February 3, 2023 3:14 PM
>> To:dev@ctakes.apache.org  <de...@ctakes.apache.org>
>> Subject: Crash course in cTakes [EXTERNAL]
>>
>> * External Email - Caution *
>>
>>
>> Hello everyone,
>>
>> Can anyone point me to an exhaustive set of documentation regarding cTakes?
>>
>>     *   Not really.  The wiki that you found is the most that there is.
>>     *   Most information is scattered across emails written on the dev and user lists.  You can search them here:https://apache.markmail.org/
>>
>> The main site feels like it was written by a marketing major, lots of
>> flash and catchiness, but little in the way of detailed documentation.
>> Even the User Install Guide and the Developer Install guide read like
>> what they are, install guides.
>>
>> For example:
>> Is cTakes the whole package, or just the front end?
>>
>>     *   ctakes is a clinical nlp platform (vague enough?).   I would say "whole package", but extendable.
>>     *   It is built on Apache UIMA and allows users to create pipelines of various nlp and i/o components.
>>     *   It comes with many components that have been built for clinical nlp.
>>     *   It is extendable; UIMA components from other sources can be placed in the pipelines.
>>     *   There are front-ends for some tasks, such as running a pipeline or creating a custom dictionary.
>>
>> If it's just the front end, what's the back end?
>>
>>     *   I would say that each UIMA component is a bit of back-end, as is the controller that actually runs the pipeline.
>>     *   As mentioned above, you can extend it with non-ctakes back-end components .
>>
>> It mentions using my UMLS credentials, can you use a local copy of the
>> relevant UMLS data?  If so how?
>>
>>     *   If you are compiling and running the source then ctakes will automatically download a default dictionary.
>>     *   If you are running a packaged binary then you'll need to manually pull down a dictionary.
>>     *   Previous to ctakes 5 downlaoding, unzipping and copying the dictionary was a manual process.
>>     *   If you are using v5 then you can run bin/getUmlsDictionary and a simple gui will do it for you.
>>     *   You can also create your own custom dictionary.
>>     *   The wiki has a page on the dictionary creator gui.
>>     *   There are instructions on youtube that start with first steps.
>>
>> Are the requirements listed, 1GB drive space, Oracle Java 1.8 the
>> minimum or the recommended?  What about RAM or CPU? Is non-Oracle Java
>> acceptable?  What about 1.17, the current LTS version?
>>
>>> 1GB disk
>> == Java 1.8
>>> 2GB RAM  (>= 4 recommended)
>>> = 64bit CPU
>> OpenJDK seems to be fine.
>>
>> Every java release past 8 is bad for ctakes.  ctakes has a lot of dependencies, many of which are old and rely on a java 8 feature here and there.  ctakes itself probably requires a java 8 special here and there, but I honestly don't know. Unfortunately, ctakes needs to have a serious update effort - maybe for v6.  Part of the problem is actually its capabilities and versatility - the availability of multiple available components and workflows.  A 'minor' change can require a dozen end-to-end tests in dev and user environments on multiple platforms.  Unit tests do not suffice.
>>
>>
>> So, does anyone know where I can find out this information?
>>
>>
>> Thanks.
>>
>> rik.
>>
>>

Re: Crash course in cTakes [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu.INVALID>.
Hi John,

Can you share any more details on this?

Thanks,

Sean
________________________________
From: Petersam, John Contractor <Jo...@ssa.gov.INVALID>
Sent: Monday, February 6, 2023 7:13 AM
To: dev@ctakes.apache.org <de...@ctakes.apache.org>
Subject: RE: Crash course in cTakes [EXTERNAL]

* External Email - Caution *


Hi Rik,
I run mine on Java 19, so it can be done.  But I have also updated dependencies and made code modifications to support it.

Thanks,
John

-----Original Message-----
From: Rick Coleman <rc...@jilocasin.net>
Sent: Friday, February 03, 2023 5:59 PM
To: dev@ctakes.apache.org
Subject: Re: Crash course in cTakes [EXTERNAL]

Sean,

Thanks for getting back to me in this.  I was afraid that was what the answer was going to be.

I appreciate you taking the time to fill in some of the gaps.  If it's so dependent on Java 1.8, someone should probably remove the "or higher"
on the download page.


I look forward to getting this application up and running.

Until then,

rik.

On 2/3/23 15:57, Finan, Sean wrote:
> Hi Rick,
>
> Thank you for the questions and for reminding us that the documentation is sparse, outdated and not very detailed.  Everybody needs a prod now and then to get things done.
>
> I hope that we can get a solid README and Wiki going on GitHub, as well as an update to the primary website.  It will take a lot of work and some cooperation by committers and users alike.
>
> I have tried to address your questions inline below.
>
> Sean
>
> ________________________________
> From: Rick Coleman <rc...@jilocasin.net>
> Sent: Friday, February 3, 2023 3:14 PM
> To: dev@ctakes.apache.org <de...@ctakes.apache.org>
> Subject: Crash course in cTakes [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hello everyone,
>
> Can anyone point me to an exhaustive set of documentation regarding cTakes?
>
>    *   Not really.  The wiki that you found is the most that there is.
>    *   Most information is scattered across emails written on the dev and user lists.  You can search them here:  https://urldefense.com/v3/__https://apache.markmail.org/__;!!NZvER7FxgEiBAiR_!vSoolzbK8NAWQaElUhpa-gH234NiQTdDCQHd7Wms90IBgEnRv2N1Sbv0Ipgp5b8G1B-nT-X-qmQjr0EJnmRDSPTBdhQxQ9dh5cMCZLEk7w$
>
> The main site feels like it was written by a marketing major, lots of
> flash and catchiness, but little in the way of detailed documentation.
> Even the User Install Guide and the Developer Install guide read like
> what they are, install guides.
>
> For example:
> Is cTakes the whole package, or just the front end?
>
>    *   ctakes is a clinical nlp platform (vague enough?).   I would say "whole package", but extendable.
>    *   It is built on Apache UIMA and allows users to create pipelines of various nlp and i/o components.
>    *   It comes with many components that have been built for clinical nlp.
>    *   It is extendable; UIMA components from other sources can be placed in the pipelines.
>    *   There are front-ends for some tasks, such as running a pipeline or creating a custom dictionary.
>
> If it's just the front end, what's the back end?
>
>    *   I would say that each UIMA component is a bit of back-end, as is the controller that actually runs the pipeline.
>    *   As mentioned above, you can extend it with non-ctakes back-end components .
>
> It mentions using my UMLS credentials, can you use a local copy of the
> relevant UMLS data?  If so how?
>
>    *   If you are compiling and running the source then ctakes will automatically download a default dictionary.
>    *   If you are running a packaged binary then you'll need to manually pull down a dictionary.
>    *   Previous to ctakes 5 downlaoding, unzipping and copying the dictionary was a manual process.
>    *   If you are using v5 then you can run bin/getUmlsDictionary and a simple gui will do it for you.
>    *   You can also create your own custom dictionary.
>    *   The wiki has a page on the dictionary creator gui.
>    *   There are instructions on youtube that start with first steps.
>
> Are the requirements listed, 1GB drive space, Oracle Java 1.8 the
> minimum or the recommended?  What about RAM or CPU? Is non-Oracle Java
> acceptable?  What about 1.17, the current LTS version?
>
>> 1GB disk
> == Java 1.8
>> 2GB RAM  (>= 4 recommended)
>> = 64bit CPU
> OpenJDK seems to be fine.
>
> Every java release past 8 is bad for ctakes.  ctakes has a lot of dependencies, many of which are old and rely on a java 8 feature here and there.  ctakes itself probably requires a java 8 special here and there, but I honestly don't know. Unfortunately, ctakes needs to have a serious update effort - maybe for v6.  Part of the problem is actually its capabilities and versatility - the availability of multiple available components and workflows.  A 'minor' change can require a dozen end-to-end tests in dev and user environments on multiple platforms.  Unit tests do not suffice.
>
>
> So, does anyone know where I can find out this information?
>
>
> Thanks.
>
> rik.
>
>

RE: Crash course in cTakes [EXTERNAL]

Posted by "Petersam, John Contractor" <Jo...@ssa.gov.INVALID>.
Hi Rik,
I run mine on Java 19, so it can be done.  But I have also updated dependencies and made code modifications to support it.

Thanks,
John

-----Original Message-----
From: Rick Coleman <rc...@jilocasin.net> 
Sent: Friday, February 03, 2023 5:59 PM
To: dev@ctakes.apache.org
Subject: Re: Crash course in cTakes [EXTERNAL]

Sean,

Thanks for getting back to me in this.  I was afraid that was what the answer was going to be.

I appreciate you taking the time to fill in some of the gaps.  If it's so dependent on Java 1.8, someone should probably remove the "or higher" 
on the download page.


I look forward to getting this application up and running.

Until then,

rik.

On 2/3/23 15:57, Finan, Sean wrote:
> Hi Rick,
>
> Thank you for the questions and for reminding us that the documentation is sparse, outdated and not very detailed.  Everybody needs a prod now and then to get things done.
>
> I hope that we can get a solid README and Wiki going on GitHub, as well as an update to the primary website.  It will take a lot of work and some cooperation by committers and users alike.
>
> I have tried to address your questions inline below.
>
> Sean
>
> ________________________________
> From: Rick Coleman <rc...@jilocasin.net>
> Sent: Friday, February 3, 2023 3:14 PM
> To: dev@ctakes.apache.org <de...@ctakes.apache.org>
> Subject: Crash course in cTakes [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hello everyone,
>
> Can anyone point me to an exhaustive set of documentation regarding cTakes?
>
>    *   Not really.  The wiki that you found is the most that there is.
>    *   Most information is scattered across emails written on the dev and user lists.  You can search them here:  https://apache.markmail.org/
>
> The main site feels like it was written by a marketing major, lots of
> flash and catchiness, but little in the way of detailed documentation.
> Even the User Install Guide and the Developer Install guide read like
> what they are, install guides.
>
> For example:
> Is cTakes the whole package, or just the front end?
>
>    *   ctakes is a clinical nlp platform (vague enough?).   I would say "whole package", but extendable.
>    *   It is built on Apache UIMA and allows users to create pipelines of various nlp and i/o components.
>    *   It comes with many components that have been built for clinical nlp.
>    *   It is extendable; UIMA components from other sources can be placed in the pipelines.
>    *   There are front-ends for some tasks, such as running a pipeline or creating a custom dictionary.
>
> If it's just the front end, what's the back end?
>
>    *   I would say that each UIMA component is a bit of back-end, as is the controller that actually runs the pipeline.
>    *   As mentioned above, you can extend it with non-ctakes back-end components .
>
> It mentions using my UMLS credentials, can you use a local copy of the
> relevant UMLS data?  If so how?
>
>    *   If you are compiling and running the source then ctakes will automatically download a default dictionary.
>    *   If you are running a packaged binary then you'll need to manually pull down a dictionary.
>    *   Previous to ctakes 5 downlaoding, unzipping and copying the dictionary was a manual process.
>    *   If you are using v5 then you can run bin/getUmlsDictionary and a simple gui will do it for you.
>    *   You can also create your own custom dictionary.
>    *   The wiki has a page on the dictionary creator gui.
>    *   There are instructions on youtube that start with first steps.
>
> Are the requirements listed, 1GB drive space, Oracle Java 1.8 the
> minimum or the recommended?  What about RAM or CPU? Is non-Oracle Java
> acceptable?  What about 1.17, the current LTS version?
>
>> 1GB disk
> == Java 1.8
>> 2GB RAM  (>= 4 recommended)
>> = 64bit CPU
> OpenJDK seems to be fine.
>
> Every java release past 8 is bad for ctakes.  ctakes has a lot of dependencies, many of which are old and rely on a java 8 feature here and there.  ctakes itself probably requires a java 8 special here and there, but I honestly don't know. Unfortunately, ctakes needs to have a serious update effort - maybe for v6.  Part of the problem is actually its capabilities and versatility - the availability of multiple available components and workflows.  A 'minor' change can require a dozen end-to-end tests in dev and user environments on multiple platforms.  Unit tests do not suffice.
>
>
> So, does anyone know where I can find out this information?
>
>
> Thanks.
>
> rik.
>
>

Re: Crash course in cTakes [EXTERNAL]

Posted by Rick Coleman <rc...@jilocasin.net>.
Sean,

Thanks for getting back to me in this.  I was afraid that was what the 
answer was going to be.

I appreciate you taking the time to fill in some of the gaps.  If it's 
so dependent on Java 1.8, someone should probably remove the "or higher" 
on the download page.


I look forward to getting this application up and running.

Until then,

rik.

On 2/3/23 15:57, Finan, Sean wrote:
> Hi Rick,
>
> Thank you for the questions and for reminding us that the documentation is sparse, outdated and not very detailed.  Everybody needs a prod now and then to get things done.
>
> I hope that we can get a solid README and Wiki going on GitHub, as well as an update to the primary website.  It will take a lot of work and some cooperation by committers and users alike.
>
> I have tried to address your questions inline below.
>
> Sean
>
> ________________________________
> From: Rick Coleman <rc...@jilocasin.net>
> Sent: Friday, February 3, 2023 3:14 PM
> To: dev@ctakes.apache.org <de...@ctakes.apache.org>
> Subject: Crash course in cTakes [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hello everyone,
>
> Can anyone point me to an exhaustive set of documentation regarding cTakes?
>
>    *   Not really.  The wiki that you found is the most that there is.
>    *   Most information is scattered across emails written on the dev and user lists.  You can search them here:  https://apache.markmail.org/
>
> The main site feels like it was written by a marketing major, lots of
> flash and catchiness, but little in the way of detailed documentation.
> Even the User Install Guide and the Developer Install guide read like
> what they are, install guides.
>
> For example:
> Is cTakes the whole package, or just the front end?
>
>    *   ctakes is a clinical nlp platform (vague enough?).   I would say "whole package", but extendable.
>    *   It is built on Apache UIMA and allows users to create pipelines of various nlp and i/o components.
>    *   It comes with many components that have been built for clinical nlp.
>    *   It is extendable; UIMA components from other sources can be placed in the pipelines.
>    *   There are front-ends for some tasks, such as running a pipeline or creating a custom dictionary.
>
> If it's just the front end, what's the back end?
>
>    *   I would say that each UIMA component is a bit of back-end, as is the controller that actually runs the pipeline.
>    *   As mentioned above, you can extend it with non-ctakes back-end components .
>
> It mentions using my UMLS credentials, can you use a local copy of the
> relevant UMLS data?  If so how?
>
>    *   If you are compiling and running the source then ctakes will automatically download a default dictionary.
>    *   If you are running a packaged binary then you'll need to manually pull down a dictionary.
>    *   Previous to ctakes 5 downlaoding, unzipping and copying the dictionary was a manual process.
>    *   If you are using v5 then you can run bin/getUmlsDictionary and a simple gui will do it for you.
>    *   You can also create your own custom dictionary.
>    *   The wiki has a page on the dictionary creator gui.
>    *   There are instructions on youtube that start with first steps.
>
> Are the requirements listed, 1GB drive space, Oracle Java 1.8 the
> minimum or the recommended?  What about RAM or CPU? Is non-Oracle Java
> acceptable?  What about 1.17, the current LTS version?
>
>> 1GB disk
> == Java 1.8
>> 2GB RAM  (>= 4 recommended)
>> = 64bit CPU
> OpenJDK seems to be fine.
>
> Every java release past 8 is bad for ctakes.  ctakes has a lot of dependencies, many of which are old and rely on a java 8 feature here and there.  ctakes itself probably requires a java 8 special here and there, but I honestly don't know. Unfortunately, ctakes needs to have a serious update effort - maybe for v6.  Part of the problem is actually its capabilities and versatility - the availability of multiple available components and workflows.  A 'minor' change can require a dozen end-to-end tests in dev and user environments on multiple platforms.  Unit tests do not suffice.
>
>
> So, does anyone know where I can find out this information?
>
>
> Thanks.
>
> rik.
>
>

Re: Crash course in cTakes [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu.INVALID>.
Hi Rick,

Thank you for the questions and for reminding us that the documentation is sparse, outdated and not very detailed.  Everybody needs a prod now and then to get things done.

I hope that we can get a solid README and Wiki going on GitHub, as well as an update to the primary website.  It will take a lot of work and some cooperation by committers and users alike.

I have tried to address your questions inline below.

Sean

________________________________
From: Rick Coleman <rc...@jilocasin.net>
Sent: Friday, February 3, 2023 3:14 PM
To: dev@ctakes.apache.org <de...@ctakes.apache.org>
Subject: Crash course in cTakes [EXTERNAL]

* External Email - Caution *


Hello everyone,

Can anyone point me to an exhaustive set of documentation regarding cTakes?

  *   Not really.  The wiki that you found is the most that there is.
  *   Most information is scattered across emails written on the dev and user lists.  You can search them here:  https://apache.markmail.org/

The main site feels like it was written by a marketing major, lots of
flash and catchiness, but little in the way of detailed documentation.
Even the User Install Guide and the Developer Install guide read like
what they are, install guides.

For example:
Is cTakes the whole package, or just the front end?

  *   ctakes is a clinical nlp platform (vague enough?).   I would say "whole package", but extendable.
  *   It is built on Apache UIMA and allows users to create pipelines of various nlp and i/o components.
  *   It comes with many components that have been built for clinical nlp.
  *   It is extendable; UIMA components from other sources can be placed in the pipelines.
  *   There are front-ends for some tasks, such as running a pipeline or creating a custom dictionary.

If it's just the front end, what's the back end?

  *   I would say that each UIMA component is a bit of back-end, as is the controller that actually runs the pipeline.
  *   As mentioned above, you can extend it with non-ctakes back-end components .

It mentions using my UMLS credentials, can you use a local copy of the
relevant UMLS data?  If so how?

  *   If you are compiling and running the source then ctakes will automatically download a default dictionary.
  *   If you are running a packaged binary then you'll need to manually pull down a dictionary.
  *   Previous to ctakes 5 downlaoding, unzipping and copying the dictionary was a manual process.
  *   If you are using v5 then you can run bin/getUmlsDictionary and a simple gui will do it for you.
  *   You can also create your own custom dictionary.
  *   The wiki has a page on the dictionary creator gui.
  *   There are instructions on youtube that start with first steps.

Are the requirements listed, 1GB drive space, Oracle Java 1.8 the
minimum or the recommended?  What about RAM or CPU? Is non-Oracle Java
acceptable?  What about 1.17, the current LTS version?

> 1GB disk
== Java 1.8
> 2GB RAM  (>= 4 recommended)
>= 64bit CPU
OpenJDK seems to be fine.

Every java release past 8 is bad for ctakes.  ctakes has a lot of dependencies, many of which are old and rely on a java 8 feature here and there.  ctakes itself probably requires a java 8 special here and there, but I honestly don't know. Unfortunately, ctakes needs to have a serious update effort - maybe for v6.  Part of the problem is actually its capabilities and versatility - the availability of multiple available components and workflows.  A 'minor' change can require a dozen end-to-end tests in dev and user environments on multiple platforms.  Unit tests do not suffice.


So, does anyone know where I can find out this information?


Thanks.

rik.