You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by "Finan, Sean" <Se...@childrens.harvard.edu> on 2021/05/20 12:16:07 UTC

Re: Java Question [EXTERNAL]

Hi John,

>Can you help me with a few questions?
- I will try.  Others may offer alternate or additional information.

> If I wanted to create a modified workflow without the entire source code, could I create a jar file of the module I wanted to modify and then replace that jar file in the User Install
- Yes.
In IntelliJ you should be able to open the Maven panel to handle this.  If you compiled with some other means, then in the top menubar View > Tool Windows > Maven.
The maven panel should display a list (tree) of all of the ctakes modules.  You can build an individual module here.  For instance, Apache cTAKES Dockhand > Lifecycle > package.  You will see some progress information in the run panel, including:
        [INFO] Building jar: C:\Spiffy\ctakes_trunk\ctakes-dockhand\target\ctakes-dockhand-4.0.1-SNAPSHOT.jar
You just build a single module's jar - in this case the ctakes basic installation gui. The location of your new jar file will be in a location on your system.

>or would I need to compile the entire source code for cTAKES?
- No, but if you do ever want to build the entire module, use the maven panel, Apache cTAKES (root) > Lifecycle > package.

>When I try to create a jar file within IntelliJ IDEA, it asks for the main class.
- It shouldn't do this if you use the maven package process as I outlined above.  If you still get a main class question then send me info on your complete process and I'll see if I can duplicate it.

>or should I build a jar file for the modified module without a main class and then replace that in the lib/ folder of the User Install version of cTAKES?
- You should build one without specifying a main class and copy it to lib/ in the User Install.

> when I try to run cTAKES, I receive the error:
The feature org.apache.ctakes.typesystem.type.textspan.List:items is declared twice, with incompatible multipleReferencesAllowed specifications
- Is it an error that stops ctakes from running or is it just a warning?
The root of the problem is that different modules have copies of the type system xml.  This should be unnecessary and causes this problem if somebody modifies properties in one but not another.
For Instance, in ctakes-type-sysstem TypeSystem.xml :
        <featureDescription>
          <name>items</name>
          <description/>
          <rangeTypeName>uima.cas.FSList</rangeTypeName>
          <elementType>uima.tcas.Annotation</elementType>
            <multipleReferencesAllowed>true</multipleReferencesAllowed>
        </featureDescription>
While in others:
                        <featureDescription>
                            <name>items</name>
                            <description/>
                            <rangeTypeName>uima.cas.FSList</rangeTypeName>
                            <elementType>uima.tcas.Annotation</elementType>
                        </featureDescription>
This is essentially a bug that I am "fixing" right .. about .. now ...

>I haven’t made any modifications to how the type system is called, only in how a custom dictionary is accessed.
- Just out of curiosity, how did you change custom dictionary access?  Maybe we can add it to ctakes.

Sean

________________________________________
From: JOHN R CASKEY <jr...@medicine.wisc.edu.INVALID>
Sent: Wednesday, May 19, 2021 5:01 PM
To: dev@ctakes.apache.org
Subject: Java Question [EXTERNAL]

* External Email - Caution *


Hello,
The cTAKES User Install is mostly sufficient for my lab, but I’ve found that I need to modify a few of the modules. I downloaded the cTAKES source and can successfully run workflows after updating the source code, but I’m having trouble building the modified modules and essentially creating an updated User Install of cTAKES from the source code. Can you help me with a few questions?


  *   I’m running IntelliJ IDEA, and I can compile the cTAKES source code with build profiles like runPiperGui without problems to run cTAKES programmatically or to start a GUI. I can also run essentially the same workflow in the User Install version by running the bash helper script ‘runPiperFile.sh’. If I wanted to create a modified workflow without the entire source code, could I create a jar file of the module I wanted to modify, and then replace that jar file in the User Install version, or would I need to compile the entire source code for cTAKES?

  *   When I try to create a jar file within IntelliJ IDEA, it asks for the main class. Would this be a class I create for the workflow I’m using, or something else? For example, could I use org.apache.ctakes.examples.pipeline.HelloWorldPiperRunner as a template to build a customized main class that would run programmatically, or should I build a jar file for the modified module without a main class and then replace that in the lib/ folder of the User Install version of cTAKES?

  *   I can create a jar file of a module within IntelliJ IDEA and then replace the modified jar file in the User Install version (for example, modifying ctakes-dictionary-lookup-fast-4.0.0.1.jar and replacing it in the lib/ folder), but when I try to run cTAKES, I receive the error:

The feature org.apache.ctakes.typesystem.type.textspan.List:items is declared twice, with incompatible multipleReferencesAllowed specifications

I haven’t made any modifications to how the type system is called, only in how a custom dictionary is accessed. Do you have any idea what could be causing this error?

Thanks!
John Caskey


Re: Java Question [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi John, I sent a reply directly to jrcaskey at medicine.wisc.edu
Let me know if you don't get it.
Sean

Re: Java Question [EXTERNAL]

Posted by JOHN R CASKEY <jr...@medicine.wisc.edu.INVALID>.
Hi Sean,
Thanks for the reply. I followed the steps you described and was able to create jar files. I've described the steps below that I took to access a custom dictionary, but another problem has come up.

1. Follow the steps on the Wiki to download the UMLS dictionary, then install a custom dictionary as described. For my purposes, I was only interested in Snomed terms in the custom dictionary.
2. Using the cTAKES User install version, run the ./bin/runPiperFile.sh via Bash and specify the custom dictionary location with the -l option. The runClinicalPipeline.sh can also be run similarly.

./bin/runPiperFile.sh -i inputDir --xmiOut outputDir -p resources/org/apache/ctakes/clinical/pipeline/DefaultFastPipeline.mod.piper -l resources/org/apache/ctakes/dictionary/lookup/fast/custom_snomed.xml

This worked without problems, and via the log file I was able to track UMLS authentication, cTAKES accessing the updated custom dictionary descriptor, and outputting XMI files.

Our lab stores data in an environment that has been totally isolated from the Internet for security reasons. Running the same steps in that environment will cause the workflow to fail when it can't connect to UMLS authentication servers. As an ad-hoc solution, I attempted to modify the cTAKES source code to always return true for the UmlsJdbcRareWordDictionary, UmlsJbcConceptFactory, AbstractJCasTermAnnotator, and DefaultJCasTermAnnotator classes to always return true for the isValidUMLSUser() method: 

// example
	public boolean isValidUMLSUser( String umlsUrl, final String vendor, final String user, final String apikey ) {
		return true;
	}

I compiled the code, created a new ctakes-dictionary-lookup-fast.jar file, copied this jar file to a tested User Install version of cTAKES, then modified the dictionary XML file to exclude UMLS authentication credentials. However, it always crashed when the initialize method from the AbstractJCasTermAnnotator class attempted to load the descriptor file path:

// from AbstractJCasTermAnnotator class
// public void initialize( final UimaContext uimaContext ) throws ResourceInitializationException {...}

      LOGGER.info( "Using Dictionary Descriptor: " + descriptorFilePath );
      try ( InputStream descriptorStream = FileLocator.getAsStream( descriptorFilePath ) ) {
         _dictionarySpec = DictionaryDescriptorParser.parseDescriptor( descriptorStream, uimaContext );
      /* it always crashes here with the error ERROR PiperFileRunner - Initialization of annotator class "org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator" failed.  (Descriptor: <unknown>)
     */
      } catch ( IOException | AnnotatorContextException multE ) {
         LOGGER.info("AnnotatorContextException, bad InputStream");
         throw new ResourceInitializationException( multE );
      }

If I modify the custom dictionary file to include my UMLS username and API Key, the same error occurs, but it works without any errors if I revert back to the unmodified jar file with UMLS authentication and my credentials. Is UMLS authentication tied to loading a custom dictionary file in some way that I've missed, and do you have any suggestions on how to use cTAKES with no external internet access?

Thanks!

John Caskey

On 5/20/21, 7:16 AM, "Finan, Sean" <Se...@childrens.harvard.edu> wrote:

    Hi John,

    >Can you help me with a few questions?
    - I will try.  Others may offer alternate or additional information.

    > If I wanted to create a modified workflow without the entire source code, could I create a jar file of the module I wanted to modify and then replace that jar file in the User Install
    - Yes.
    In IntelliJ you should be able to open the Maven panel to handle this.  If you compiled with some other means, then in the top menubar View > Tool Windows > Maven.
    The maven panel should display a list (tree) of all of the ctakes modules.  You can build an individual module here.  For instance, Apache cTAKES Dockhand > Lifecycle > package.  You will see some progress information in the run panel, including:
            [INFO] Building jar: C:\Spiffy\ctakes_trunk\ctakes-dockhand\target\ctakes-dockhand-4.0.1-SNAPSHOT.jar
    You just build a single module's jar - in this case the ctakes basic installation gui. The location of your new jar file will be in a location on your system.

    >or would I need to compile the entire source code for cTAKES?
    - No, but if you do ever want to build the entire module, use the maven panel, Apache cTAKES (root) > Lifecycle > package.

    >When I try to create a jar file within IntelliJ IDEA, it asks for the main class.
    - It shouldn't do this if you use the maven package process as I outlined above.  If you still get a main class question then send me info on your complete process and I'll see if I can duplicate it.

    >or should I build a jar file for the modified module without a main class and then replace that in the lib/ folder of the User Install version of cTAKES?
    - You should build one without specifying a main class and copy it to lib/ in the User Install.

    > when I try to run cTAKES, I receive the error:
    The feature org.apache.ctakes.typesystem.type.textspan.List:items is declared twice, with incompatible multipleReferencesAllowed specifications
    - Is it an error that stops ctakes from running or is it just a warning?
    The root of the problem is that different modules have copies of the type system xml.  This should be unnecessary and causes this problem if somebody modifies properties in one but not another.
    For Instance, in ctakes-type-sysstem TypeSystem.xml :
            <featureDescription>
              <name>items</name>
              <description/>
              <rangeTypeName>uima.cas.FSList</rangeTypeName>
              <elementType>uima.tcas.Annotation</elementType>
                <multipleReferencesAllowed>true</multipleReferencesAllowed>
            </featureDescription>
    While in others:
                            <featureDescription>
                                <name>items</name>
                                <description/>
                                <rangeTypeName>uima.cas.FSList</rangeTypeName>
                                <elementType>uima.tcas.Annotation</elementType>
                            </featureDescription>
    This is essentially a bug that I am "fixing" right .. about .. now ...

    >I haven’t made any modifications to how the type system is called, only in how a custom dictionary is accessed.
    - Just out of curiosity, how did you change custom dictionary access?  Maybe we can add it to ctakes.

    Sean

    ________________________________________
    From: JOHN R CASKEY <jr...@medicine.wisc.edu.INVALID>
    Sent: Wednesday, May 19, 2021 5:01 PM
    To: dev@ctakes.apache.org
    Subject: Java Question [EXTERNAL]

    * External Email - Caution *


    Hello,
    The cTAKES User Install is mostly sufficient for my lab, but I’ve found that I need to modify a few of the modules. I downloaded the cTAKES source and can successfully run workflows after updating the source code, but I’m having trouble building the modified modules and essentially creating an updated User Install of cTAKES from the source code. Can you help me with a few questions?


      *   I’m running IntelliJ IDEA, and I can compile the cTAKES source code with build profiles like runPiperGui without problems to run cTAKES programmatically or to start a GUI. I can also run essentially the same workflow in the User Install version by running the bash helper script ‘runPiperFile.sh’. If I wanted to create a modified workflow without the entire source code, could I create a jar file of the module I wanted to modify, and then replace that jar file in the User Install version, or would I need to compile the entire source code for cTAKES?

      *   When I try to create a jar file within IntelliJ IDEA, it asks for the main class. Would this be a class I create for the workflow I’m using, or something else? For example, could I use org.apache.ctakes.examples.pipeline.HelloWorldPiperRunner as a template to build a customized main class that would run programmatically, or should I build a jar file for the modified module without a main class and then replace that in the lib/ folder of the User Install version of cTAKES?

      *   I can create a jar file of a module within IntelliJ IDEA and then replace the modified jar file in the User Install version (for example, modifying ctakes-dictionary-lookup-fast-4.0.0.1.jar and replacing it in the lib/ folder), but when I try to run cTAKES, I receive the error:

    The feature org.apache.ctakes.typesystem.type.textspan.List:items is declared twice, with incompatible multipleReferencesAllowed specifications

    I haven’t made any modifications to how the type system is called, only in how a custom dictionary is accessed. Do you have any idea what could be causing this error?

    Thanks!
    John Caskey