You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by Erin Gustafson <er...@northwestern.edu> on 2017/06/08 15:15:50 UTC

Pipeline executable

Hi all,

I have a project that contains a series of classes to build cTAKES pipelines. I've been successfully running the pipelines myself within an IDE, but would like to be able to provide collaborators with an executable jar file to run our pipeline.

So far, I've managed to build a jar that will start running the pipeline from the command line. It successfully initializes the annotators but throws an exception when processing begins:

Exception in thread "main" java.lang.IllegalStateException: org.apache.uima.resource.ResourceInitializationException: Undefined type "org.apache.ctakes.typesystem.type.textspan.Segment" in type priority list. (Descriptor: <unknown>)

Any thoughts about how to resolve this error? Let me know if I can provide any more information..

Thanks,
Erin


RE: Pipeline executable

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Erin,

My apologies for the late reply.  I can't see why your code doesn't find "Segment" especially since the java bytecode class exists.  The only thing that I would try is mucking around with the maven project - reload dependencies, check them in project configuration, etc.  Maybe even starting/importing a new project from scratch could help.

Sean

-----Original Message-----
From: Erin Gustafson [mailto:erin.gustafson@northwestern.edu] 
Sent: Monday, June 12, 2017 1:09 PM
To: dev@ctakes.apache.org
Subject: RE: Pipeline executable

Hi Sean,

Thanks for those suggestions! I'll work on paring the project down to the bare essentials.

Any thoughts on what could be causing the type error or how to go about diagnosing the problem?

Best,
Erin


-----Original Message-----
From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu] 
Sent: Thursday, June 08, 2017 11:56 AM
To: dev@ctakes.apache.org
Subject: RE: Pipeline executable

Hi Erin,

This has nothing to do with the topic, but you may want to make the following paths relative:
CollectionReaderFactory.createReaderDescriptionFromPath("C:/Users/eng148/Documents/GitHub/cTAKESPipelines/src/resources/desc/FilesInDirectoryCollectionReader.xml",
aggregateBuilder.add(DefaultJCasTermAnnotator.createAnnotatorDescription("C:/Users/eng148/Documents/GitHub/cTAKESPipelines/src/resources/desc/BsvDictionaryAD.xml"));

Alternatively, can you create a description straight from source and not xml descriptors?

Using the dependencies of ctakes-clinical-pipeline will add a lot of things that you don't need accordingly to your pipeline - basically everything that I wrote in my previous email and then some.  Your ctakes-clinical-pipeline pom.xml should look more like this:

<dependencies>
    <dependency>
        <groupId>org.apache.ctakes</groupId>
        <artifactId>ctakes-dictionary-lookup-fast</artifactId>
    </dependency>
    <dependency>
        <groupId>org.apache.ctakes</groupId>
        <artifactId>ctakes-preprocessor</artifactId>
    </dependency>
    <dependency>
        <groupId>org.apache.ctakes</groupId>
        <artifactId>ctakes-ne-contexts</artifactId>
    </dependency>
    <dependency>
        <groupId>org.apache.ctakes</groupId>
        <artifactId>ctakes-assertion</artifactId>
    </dependency>
    <dependency>
        <groupId>org.apache.ctakes</groupId>
        <artifactId>ctakes-clinical-pipeline-res</artifactId>
    </dependency>
</dependencies>

A lot shorter, right?  Those dependencies include everything that you really need to run the default clinical pipeline, which is basically what you have for your listed pipeline.

In order to keep your temporal and relation pipelines working, you'll need to add the following (basically a leaner temporal pom.xml segment):
<dependencies>
    <dependency>
        <groupId>org.apache.ctakes</groupId>
        <artifactId>ctakes-temporal-res</artifactId>
    </dependency>
    <dependency>
        <groupId>org.apache.ctakes</groupId>
        <artifactId>ctakes-relation-extractor</artifactId>
    </dependency>
    <dependency>
        <groupId>org.cleartk</groupId>
        <artifactId>cleartk-timeml</artifactId>
    </dependency>
    <dependency>
        <groupId>org.cleartk</groupId>
        <artifactId>cleartk-ml-svmlight</artifactId>
    </dependency>
    <dependency>
        <groupId>org.cleartk</groupId>
        <artifactId>cleartk-ml-tksvmlight</artifactId>
    </dependency>
    <dependency>
        <groupId>org.cleartk</groupId>
        <artifactId>cleartk-type-system</artifactId>
    </dependency>
    <dependency>
        <groupId>org.cleartk</groupId>
        <artifactId>cleartk-ml-crfsuite</artifactId>
    </dependency>
    <dependency>
        <groupId>info.bethard</groupId>
        <artifactId>timenorm</artifactId>
        <version>0.9.5</version>
    </dependency>
    <dependency>
        <groupId>com.googlecode.java-diff-utils</groupId>
        <artifactId>diffutils</artifactId>
        <version>1.3.0</version>
    </dependency>
    <dependency>
        <groupId>org.cleartk</groupId>
        <artifactId>cleartk-ml-mallet</artifactId>
        <!-->version>2.0.1-SNAPSHOT</version-->
    </dependency>
    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-library</artifactId>
        <version>2.11.7</version>
    </dependency>
</dependencies>

It may take a couple of tries, but you could edit (comments) the main pom.xml file and exclude unwanted modules, then reimport to intellij and let it create a trimmer looking project for you.

Sean

-----Original Message-----
From: Erin Gustafson [mailto:erin.gustafson@northwestern.edu] 
Sent: Thursday, June 08, 2017 11:54 AM
To: dev@ctakes.apache.org
Subject: RE: Pipeline executable

Hi Sean,

I'm building the jar within IntelliJ IDEA. Project Structure>Artifacts>New artifact>Jar>From modules with dependencies... I've made one of my pipeline classes the main class, am extracting to the target JAR, and put the manifest file in my resources directory.

Here's what my pipeline looks like, to give you a sense of my goals:

CollectionReaderDescription collectionReader = CollectionReaderFactory.createReaderDescriptionFromPath("C:/Users/eng148/Documents/GitHub/cTAKESPipelines/src/resources/desc/FilesInDirectoryCollectionReader.xml",
                ConfigParameterConstants.PARAM_INPUTDIR,
                inputDir);

        AggregateBuilder aggregateBuilder = new AggregateBuilder();
        aggregateBuilder.add(SimpleSegmentAnnotator.createAnnotatorDescription());
        aggregateBuilder.add(SentenceDetector.createAnnotatorDescription());
        aggregateBuilder.add(TokenizerAnnotatorPTB.createAnnotatorDescription());
//        aggregateBuilder.add(LvgAnnotator.createAnnotatorDescription()); //URI not hierarchical error
        aggregateBuilder.add(ContextDependentTokenizerAnnotator.createAnnotatorDescription());
        aggregateBuilder.add(POSTagger.createAnnotatorDescription());
        aggregateBuilder.add(Chunker.createAnnotatorDescription("C:/Users/eng148/Documents/GitHub/cTAKESPipelines/src/resources/model/chunker-model.zip"));
        aggregateBuilder.add(ChunkAdjuster.createAnnotatorDescription(new String[] {"NP", "NP"}, 1));
        aggregateBuilder.add(ChunkAdjuster.createAnnotatorDescription(new String[] {"NP", "PP", "NP"}, 2));
        aggregateBuilder.add(DefaultJCasTermAnnotator.createAnnotatorDescription("C:/Users/eng148/Documents/GitHub/cTAKESPipelines/src/resources/desc/BsvDictionaryAD.xml"));
        aggregateBuilder.add(ClearNLPDependencyParserAE.createAnnotatorDescription());
        aggregateBuilder.add(AnalysisEngineFactory.createEngineDescription(ContextAnnotator.class)); // negation
        aggregateBuilder.add(AnalysisEngineFactory.createEngineDescription(ContextAnnotator.class, // status
                ContextAnnotator.MAX_LEFT_SCOPE_SIZE_PARAM, 10,
                ContextAnnotator.MAX_RIGHT_SCOPE_SIZE_PARAM, 10,
                "ContextAnalyzerClass", "org.apache.ctakes.necontexts.status.StatusContextAnalyzer",
                "ContextHitConsumerClass", "org.apache.ctakes.necontexts.status.StatusContextHitConsumer"));
        aggregateBuilder.add(SubjectCleartkAnalysisEngine.createAnnotatorDescription());

I'm using gradle, with the following dependencies included:
    compile 'org.apache.ctakes:ctakes-type-system:4.0.0'
    compile 'org.apache.ctakes:ctakes-clinical-pipeline:4.0.0'
    compile 'org.apache.ctakes:ctakes-core:4.0.0'
    compile 'org.apache.ctakes:ctakes:4.0.0'

For the pipeline I'm worried about making portable right now, I do not need relations, temporal information, or coreferences. But there are other pipelines within the project for doing location relation and temporal relation extraction. I'm still learning Java and how to use tools like gradle/maven, so it's definitely possible that I don't need all of those dependencies listed above. I was just erring on the side of getting it to work!

Erin


-----Original Message-----
From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu] 
Sent: Thursday, June 08, 2017 10:39 AM
To: dev@ctakes.apache.org
Subject: RE: Pipeline executable

Before I dig into the error and all enigmas uima, how are you building the jar?  Also, what do you need from ctakes?  If you do not need the higher functions for relations, temporal information, coreferences ... or the sideline items like smoking status, drug-ner, ytex (a big one) ... then you can probably create a jar that is about half that size just by getting rid of their libraries and dependencies.

Sean

-----Original Message-----
From: Erin Gustafson [mailto:erin.gustafson@northwestern.edu] 
Sent: Thursday, June 08, 2017 11:33 AM
To: dev@ctakes.apache.org
Subject: RE: Pipeline executable

Within org.apache.ctakes.typesystem itself there are no classes, but in org.apache.ctakes.typesystem.type.textspan I do see Segment.class.

The jar is indeed huge (1.14 GB). Open to any suggestions for the most efficient way to go about this!

Erin


-----Original Message-----
From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu] 
Sent: Thursday, June 08, 2017 10:24 AM
To: dev@ctakes.apache.org
Subject: RE: Pipeline executable

Hi Erin,

Do you have any classes in ctakes-typesystem (org.apache.ctakes.type.system)?  It could be that jcasgen needs to be run.

Just out of curiosity, how huge is your jar file?  You may be able to decrease the size ...

Sean

-----Original Message-----
From: Erin Gustafson [mailto:erin.gustafson@northwestern.edu] 
Sent: Thursday, June 08, 2017 11:16 AM
To: dev@ctakes.apache.org
Subject: Pipeline executable

Hi all,

I have a project that contains a series of classes to build cTAKES pipelines. I've been successfully running the pipelines myself within an IDE, but would like to be able to provide collaborators with an executable jar file to run our pipeline.

So far, I've managed to build a jar that will start running the pipeline from the command line. It successfully initializes the annotators but throws an exception when processing begins:

Exception in thread "main" java.lang.IllegalStateException: org.apache.uima.resource.ResourceInitializationException: Undefined type "org.apache.ctakes.typesystem.type.textspan.Segment" in type priority list. (Descriptor: <unknown>)

Any thoughts about how to resolve this error? Let me know if I can provide any more information..

Thanks,
Erin


RE: Pipeline executable

Posted by Erin Gustafson <er...@northwestern.edu>.
Hi all,

Has anyone had success bundling a pipeline into an executable jar?

Thanks,
Erin


-----Original Message-----
From: Erin Gustafson 
Sent: Monday, June 12, 2017 12:09 PM
To: dev@ctakes.apache.org
Subject: RE: Pipeline executable

Hi Sean,

Thanks for those suggestions! I'll work on paring the project down to the bare essentials.

Any thoughts on what could be causing the type error or how to go about diagnosing the problem?

Best,
Erin


-----Original Message-----
From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu] 
Sent: Thursday, June 08, 2017 11:56 AM
To: dev@ctakes.apache.org
Subject: RE: Pipeline executable

Hi Erin,

This has nothing to do with the topic, but you may want to make the following paths relative:
CollectionReaderFactory.createReaderDescriptionFromPath("C:/Users/eng148/Documents/GitHub/cTAKESPipelines/src/resources/desc/FilesInDirectoryCollectionReader.xml",
aggregateBuilder.add(DefaultJCasTermAnnotator.createAnnotatorDescription("C:/Users/eng148/Documents/GitHub/cTAKESPipelines/src/resources/desc/BsvDictionaryAD.xml"));

Alternatively, can you create a description straight from source and not xml descriptors?

Using the dependencies of ctakes-clinical-pipeline will add a lot of things that you don't need accordingly to your pipeline - basically everything that I wrote in my previous email and then some.  Your ctakes-clinical-pipeline pom.xml should look more like this:

<dependencies>
    <dependency>
        <groupId>org.apache.ctakes</groupId>
        <artifactId>ctakes-dictionary-lookup-fast</artifactId>
    </dependency>
    <dependency>
        <groupId>org.apache.ctakes</groupId>
        <artifactId>ctakes-preprocessor</artifactId>
    </dependency>
    <dependency>
        <groupId>org.apache.ctakes</groupId>
        <artifactId>ctakes-ne-contexts</artifactId>
    </dependency>
    <dependency>
        <groupId>org.apache.ctakes</groupId>
        <artifactId>ctakes-assertion</artifactId>
    </dependency>
    <dependency>
        <groupId>org.apache.ctakes</groupId>
        <artifactId>ctakes-clinical-pipeline-res</artifactId>
    </dependency>
</dependencies>

A lot shorter, right?  Those dependencies include everything that you really need to run the default clinical pipeline, which is basically what you have for your listed pipeline.

In order to keep your temporal and relation pipelines working, you'll need to add the following (basically a leaner temporal pom.xml segment):
<dependencies>
    <dependency>
        <groupId>org.apache.ctakes</groupId>
        <artifactId>ctakes-temporal-res</artifactId>
    </dependency>
    <dependency>
        <groupId>org.apache.ctakes</groupId>
        <artifactId>ctakes-relation-extractor</artifactId>
    </dependency>
    <dependency>
        <groupId>org.cleartk</groupId>
        <artifactId>cleartk-timeml</artifactId>
    </dependency>
    <dependency>
        <groupId>org.cleartk</groupId>
        <artifactId>cleartk-ml-svmlight</artifactId>
    </dependency>
    <dependency>
        <groupId>org.cleartk</groupId>
        <artifactId>cleartk-ml-tksvmlight</artifactId>
    </dependency>
    <dependency>
        <groupId>org.cleartk</groupId>
        <artifactId>cleartk-type-system</artifactId>
    </dependency>
    <dependency>
        <groupId>org.cleartk</groupId>
        <artifactId>cleartk-ml-crfsuite</artifactId>
    </dependency>
    <dependency>
        <groupId>info.bethard</groupId>
        <artifactId>timenorm</artifactId>
        <version>0.9.5</version>
    </dependency>
    <dependency>
        <groupId>com.googlecode.java-diff-utils</groupId>
        <artifactId>diffutils</artifactId>
        <version>1.3.0</version>
    </dependency>
    <dependency>
        <groupId>org.cleartk</groupId>
        <artifactId>cleartk-ml-mallet</artifactId>
        <!-->version>2.0.1-SNAPSHOT</version-->
    </dependency>
    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-library</artifactId>
        <version>2.11.7</version>
    </dependency>
</dependencies>

It may take a couple of tries, but you could edit (comments) the main pom.xml file and exclude unwanted modules, then reimport to intellij and let it create a trimmer looking project for you.

Sean

-----Original Message-----
From: Erin Gustafson [mailto:erin.gustafson@northwestern.edu] 
Sent: Thursday, June 08, 2017 11:54 AM
To: dev@ctakes.apache.org
Subject: RE: Pipeline executable

Hi Sean,

I'm building the jar within IntelliJ IDEA. Project Structure>Artifacts>New artifact>Jar>From modules with dependencies... I've made one of my pipeline classes the main class, am extracting to the target JAR, and put the manifest file in my resources directory.

Here's what my pipeline looks like, to give you a sense of my goals:

CollectionReaderDescription collectionReader = CollectionReaderFactory.createReaderDescriptionFromPath("C:/Users/eng148/Documents/GitHub/cTAKESPipelines/src/resources/desc/FilesInDirectoryCollectionReader.xml",
                ConfigParameterConstants.PARAM_INPUTDIR,
                inputDir);

        AggregateBuilder aggregateBuilder = new AggregateBuilder();
        aggregateBuilder.add(SimpleSegmentAnnotator.createAnnotatorDescription());
        aggregateBuilder.add(SentenceDetector.createAnnotatorDescription());
        aggregateBuilder.add(TokenizerAnnotatorPTB.createAnnotatorDescription());
//        aggregateBuilder.add(LvgAnnotator.createAnnotatorDescription()); //URI not hierarchical error
        aggregateBuilder.add(ContextDependentTokenizerAnnotator.createAnnotatorDescription());
        aggregateBuilder.add(POSTagger.createAnnotatorDescription());
        aggregateBuilder.add(Chunker.createAnnotatorDescription("C:/Users/eng148/Documents/GitHub/cTAKESPipelines/src/resources/model/chunker-model.zip"));
        aggregateBuilder.add(ChunkAdjuster.createAnnotatorDescription(new String[] {"NP", "NP"}, 1));
        aggregateBuilder.add(ChunkAdjuster.createAnnotatorDescription(new String[] {"NP", "PP", "NP"}, 2));
        aggregateBuilder.add(DefaultJCasTermAnnotator.createAnnotatorDescription("C:/Users/eng148/Documents/GitHub/cTAKESPipelines/src/resources/desc/BsvDictionaryAD.xml"));
        aggregateBuilder.add(ClearNLPDependencyParserAE.createAnnotatorDescription());
        aggregateBuilder.add(AnalysisEngineFactory.createEngineDescription(ContextAnnotator.class)); // negation
        aggregateBuilder.add(AnalysisEngineFactory.createEngineDescription(ContextAnnotator.class, // status
                ContextAnnotator.MAX_LEFT_SCOPE_SIZE_PARAM, 10,
                ContextAnnotator.MAX_RIGHT_SCOPE_SIZE_PARAM, 10,
                "ContextAnalyzerClass", "org.apache.ctakes.necontexts.status.StatusContextAnalyzer",
                "ContextHitConsumerClass", "org.apache.ctakes.necontexts.status.StatusContextHitConsumer"));
        aggregateBuilder.add(SubjectCleartkAnalysisEngine.createAnnotatorDescription());

I'm using gradle, with the following dependencies included:
    compile 'org.apache.ctakes:ctakes-type-system:4.0.0'
    compile 'org.apache.ctakes:ctakes-clinical-pipeline:4.0.0'
    compile 'org.apache.ctakes:ctakes-core:4.0.0'
    compile 'org.apache.ctakes:ctakes:4.0.0'

For the pipeline I'm worried about making portable right now, I do not need relations, temporal information, or coreferences. But there are other pipelines within the project for doing location relation and temporal relation extraction. I'm still learning Java and how to use tools like gradle/maven, so it's definitely possible that I don't need all of those dependencies listed above. I was just erring on the side of getting it to work!

Erin


-----Original Message-----
From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu] 
Sent: Thursday, June 08, 2017 10:39 AM
To: dev@ctakes.apache.org
Subject: RE: Pipeline executable

Before I dig into the error and all enigmas uima, how are you building the jar?  Also, what do you need from ctakes?  If you do not need the higher functions for relations, temporal information, coreferences ... or the sideline items like smoking status, drug-ner, ytex (a big one) ... then you can probably create a jar that is about half that size just by getting rid of their libraries and dependencies.

Sean

-----Original Message-----
From: Erin Gustafson [mailto:erin.gustafson@northwestern.edu] 
Sent: Thursday, June 08, 2017 11:33 AM
To: dev@ctakes.apache.org
Subject: RE: Pipeline executable

Within org.apache.ctakes.typesystem itself there are no classes, but in org.apache.ctakes.typesystem.type.textspan I do see Segment.class.

The jar is indeed huge (1.14 GB). Open to any suggestions for the most efficient way to go about this!

Erin


-----Original Message-----
From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu] 
Sent: Thursday, June 08, 2017 10:24 AM
To: dev@ctakes.apache.org
Subject: RE: Pipeline executable

Hi Erin,

Do you have any classes in ctakes-typesystem (org.apache.ctakes.type.system)?  It could be that jcasgen needs to be run.

Just out of curiosity, how huge is your jar file?  You may be able to decrease the size ...

Sean

-----Original Message-----
From: Erin Gustafson [mailto:erin.gustafson@northwestern.edu] 
Sent: Thursday, June 08, 2017 11:16 AM
To: dev@ctakes.apache.org
Subject: Pipeline executable

Hi all,

I have a project that contains a series of classes to build cTAKES pipelines. I've been successfully running the pipelines myself within an IDE, but would like to be able to provide collaborators with an executable jar file to run our pipeline.

So far, I've managed to build a jar that will start running the pipeline from the command line. It successfully initializes the annotators but throws an exception when processing begins:

Exception in thread "main" java.lang.IllegalStateException: org.apache.uima.resource.ResourceInitializationException: Undefined type "org.apache.ctakes.typesystem.type.textspan.Segment" in type priority list. (Descriptor: <unknown>)

Any thoughts about how to resolve this error? Let me know if I can provide any more information..

Thanks,
Erin


RE: Pipeline executable

Posted by Erin Gustafson <er...@northwestern.edu>.
Hi Sean,

Thanks for those suggestions! I'll work on paring the project down to the bare essentials.

Any thoughts on what could be causing the type error or how to go about diagnosing the problem?

Best,
Erin


-----Original Message-----
From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu] 
Sent: Thursday, June 08, 2017 11:56 AM
To: dev@ctakes.apache.org
Subject: RE: Pipeline executable

Hi Erin,

This has nothing to do with the topic, but you may want to make the following paths relative:
CollectionReaderFactory.createReaderDescriptionFromPath("C:/Users/eng148/Documents/GitHub/cTAKESPipelines/src/resources/desc/FilesInDirectoryCollectionReader.xml",
aggregateBuilder.add(DefaultJCasTermAnnotator.createAnnotatorDescription("C:/Users/eng148/Documents/GitHub/cTAKESPipelines/src/resources/desc/BsvDictionaryAD.xml"));

Alternatively, can you create a description straight from source and not xml descriptors?

Using the dependencies of ctakes-clinical-pipeline will add a lot of things that you don't need accordingly to your pipeline - basically everything that I wrote in my previous email and then some.  Your ctakes-clinical-pipeline pom.xml should look more like this:

<dependencies>
    <dependency>
        <groupId>org.apache.ctakes</groupId>
        <artifactId>ctakes-dictionary-lookup-fast</artifactId>
    </dependency>
    <dependency>
        <groupId>org.apache.ctakes</groupId>
        <artifactId>ctakes-preprocessor</artifactId>
    </dependency>
    <dependency>
        <groupId>org.apache.ctakes</groupId>
        <artifactId>ctakes-ne-contexts</artifactId>
    </dependency>
    <dependency>
        <groupId>org.apache.ctakes</groupId>
        <artifactId>ctakes-assertion</artifactId>
    </dependency>
    <dependency>
        <groupId>org.apache.ctakes</groupId>
        <artifactId>ctakes-clinical-pipeline-res</artifactId>
    </dependency>
</dependencies>

A lot shorter, right?  Those dependencies include everything that you really need to run the default clinical pipeline, which is basically what you have for your listed pipeline.

In order to keep your temporal and relation pipelines working, you'll need to add the following (basically a leaner temporal pom.xml segment):
<dependencies>
    <dependency>
        <groupId>org.apache.ctakes</groupId>
        <artifactId>ctakes-temporal-res</artifactId>
    </dependency>
    <dependency>
        <groupId>org.apache.ctakes</groupId>
        <artifactId>ctakes-relation-extractor</artifactId>
    </dependency>
    <dependency>
        <groupId>org.cleartk</groupId>
        <artifactId>cleartk-timeml</artifactId>
    </dependency>
    <dependency>
        <groupId>org.cleartk</groupId>
        <artifactId>cleartk-ml-svmlight</artifactId>
    </dependency>
    <dependency>
        <groupId>org.cleartk</groupId>
        <artifactId>cleartk-ml-tksvmlight</artifactId>
    </dependency>
    <dependency>
        <groupId>org.cleartk</groupId>
        <artifactId>cleartk-type-system</artifactId>
    </dependency>
    <dependency>
        <groupId>org.cleartk</groupId>
        <artifactId>cleartk-ml-crfsuite</artifactId>
    </dependency>
    <dependency>
        <groupId>info.bethard</groupId>
        <artifactId>timenorm</artifactId>
        <version>0.9.5</version>
    </dependency>
    <dependency>
        <groupId>com.googlecode.java-diff-utils</groupId>
        <artifactId>diffutils</artifactId>
        <version>1.3.0</version>
    </dependency>
    <dependency>
        <groupId>org.cleartk</groupId>
        <artifactId>cleartk-ml-mallet</artifactId>
        <!-->version>2.0.1-SNAPSHOT</version-->
    </dependency>
    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-library</artifactId>
        <version>2.11.7</version>
    </dependency>
</dependencies>

It may take a couple of tries, but you could edit (comments) the main pom.xml file and exclude unwanted modules, then reimport to intellij and let it create a trimmer looking project for you.

Sean

-----Original Message-----
From: Erin Gustafson [mailto:erin.gustafson@northwestern.edu] 
Sent: Thursday, June 08, 2017 11:54 AM
To: dev@ctakes.apache.org
Subject: RE: Pipeline executable

Hi Sean,

I'm building the jar within IntelliJ IDEA. Project Structure>Artifacts>New artifact>Jar>From modules with dependencies... I've made one of my pipeline classes the main class, am extracting to the target JAR, and put the manifest file in my resources directory.

Here's what my pipeline looks like, to give you a sense of my goals:

CollectionReaderDescription collectionReader = CollectionReaderFactory.createReaderDescriptionFromPath("C:/Users/eng148/Documents/GitHub/cTAKESPipelines/src/resources/desc/FilesInDirectoryCollectionReader.xml",
                ConfigParameterConstants.PARAM_INPUTDIR,
                inputDir);

        AggregateBuilder aggregateBuilder = new AggregateBuilder();
        aggregateBuilder.add(SimpleSegmentAnnotator.createAnnotatorDescription());
        aggregateBuilder.add(SentenceDetector.createAnnotatorDescription());
        aggregateBuilder.add(TokenizerAnnotatorPTB.createAnnotatorDescription());
//        aggregateBuilder.add(LvgAnnotator.createAnnotatorDescription()); //URI not hierarchical error
        aggregateBuilder.add(ContextDependentTokenizerAnnotator.createAnnotatorDescription());
        aggregateBuilder.add(POSTagger.createAnnotatorDescription());
        aggregateBuilder.add(Chunker.createAnnotatorDescription("C:/Users/eng148/Documents/GitHub/cTAKESPipelines/src/resources/model/chunker-model.zip"));
        aggregateBuilder.add(ChunkAdjuster.createAnnotatorDescription(new String[] {"NP", "NP"}, 1));
        aggregateBuilder.add(ChunkAdjuster.createAnnotatorDescription(new String[] {"NP", "PP", "NP"}, 2));
        aggregateBuilder.add(DefaultJCasTermAnnotator.createAnnotatorDescription("C:/Users/eng148/Documents/GitHub/cTAKESPipelines/src/resources/desc/BsvDictionaryAD.xml"));
        aggregateBuilder.add(ClearNLPDependencyParserAE.createAnnotatorDescription());
        aggregateBuilder.add(AnalysisEngineFactory.createEngineDescription(ContextAnnotator.class)); // negation
        aggregateBuilder.add(AnalysisEngineFactory.createEngineDescription(ContextAnnotator.class, // status
                ContextAnnotator.MAX_LEFT_SCOPE_SIZE_PARAM, 10,
                ContextAnnotator.MAX_RIGHT_SCOPE_SIZE_PARAM, 10,
                "ContextAnalyzerClass", "org.apache.ctakes.necontexts.status.StatusContextAnalyzer",
                "ContextHitConsumerClass", "org.apache.ctakes.necontexts.status.StatusContextHitConsumer"));
        aggregateBuilder.add(SubjectCleartkAnalysisEngine.createAnnotatorDescription());

I'm using gradle, with the following dependencies included:
    compile 'org.apache.ctakes:ctakes-type-system:4.0.0'
    compile 'org.apache.ctakes:ctakes-clinical-pipeline:4.0.0'
    compile 'org.apache.ctakes:ctakes-core:4.0.0'
    compile 'org.apache.ctakes:ctakes:4.0.0'

For the pipeline I'm worried about making portable right now, I do not need relations, temporal information, or coreferences. But there are other pipelines within the project for doing location relation and temporal relation extraction. I'm still learning Java and how to use tools like gradle/maven, so it's definitely possible that I don't need all of those dependencies listed above. I was just erring on the side of getting it to work!

Erin


-----Original Message-----
From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu] 
Sent: Thursday, June 08, 2017 10:39 AM
To: dev@ctakes.apache.org
Subject: RE: Pipeline executable

Before I dig into the error and all enigmas uima, how are you building the jar?  Also, what do you need from ctakes?  If you do not need the higher functions for relations, temporal information, coreferences ... or the sideline items like smoking status, drug-ner, ytex (a big one) ... then you can probably create a jar that is about half that size just by getting rid of their libraries and dependencies.

Sean

-----Original Message-----
From: Erin Gustafson [mailto:erin.gustafson@northwestern.edu] 
Sent: Thursday, June 08, 2017 11:33 AM
To: dev@ctakes.apache.org
Subject: RE: Pipeline executable

Within org.apache.ctakes.typesystem itself there are no classes, but in org.apache.ctakes.typesystem.type.textspan I do see Segment.class.

The jar is indeed huge (1.14 GB). Open to any suggestions for the most efficient way to go about this!

Erin


-----Original Message-----
From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu] 
Sent: Thursday, June 08, 2017 10:24 AM
To: dev@ctakes.apache.org
Subject: RE: Pipeline executable

Hi Erin,

Do you have any classes in ctakes-typesystem (org.apache.ctakes.type.system)?  It could be that jcasgen needs to be run.

Just out of curiosity, how huge is your jar file?  You may be able to decrease the size ...

Sean

-----Original Message-----
From: Erin Gustafson [mailto:erin.gustafson@northwestern.edu] 
Sent: Thursday, June 08, 2017 11:16 AM
To: dev@ctakes.apache.org
Subject: Pipeline executable

Hi all,

I have a project that contains a series of classes to build cTAKES pipelines. I've been successfully running the pipelines myself within an IDE, but would like to be able to provide collaborators with an executable jar file to run our pipeline.

So far, I've managed to build a jar that will start running the pipeline from the command line. It successfully initializes the annotators but throws an exception when processing begins:

Exception in thread "main" java.lang.IllegalStateException: org.apache.uima.resource.ResourceInitializationException: Undefined type "org.apache.ctakes.typesystem.type.textspan.Segment" in type priority list. (Descriptor: <unknown>)

Any thoughts about how to resolve this error? Let me know if I can provide any more information..

Thanks,
Erin


RE: Pipeline executable

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Erin,

This has nothing to do with the topic, but you may want to make the following paths relative:
CollectionReaderFactory.createReaderDescriptionFromPath("C:/Users/eng148/Documents/GitHub/cTAKESPipelines/src/resources/desc/FilesInDirectoryCollectionReader.xml",
aggregateBuilder.add(DefaultJCasTermAnnotator.createAnnotatorDescription("C:/Users/eng148/Documents/GitHub/cTAKESPipelines/src/resources/desc/BsvDictionaryAD.xml"));

Alternatively, can you create a description straight from source and not xml descriptors?

Using the dependencies of ctakes-clinical-pipeline will add a lot of things that you don't need accordingly to your pipeline - basically everything that I wrote in my previous email and then some.  Your ctakes-clinical-pipeline pom.xml should look more like this:

<dependencies>
    <dependency>
        <groupId>org.apache.ctakes</groupId>
        <artifactId>ctakes-dictionary-lookup-fast</artifactId>
    </dependency>
    <dependency>
        <groupId>org.apache.ctakes</groupId>
        <artifactId>ctakes-preprocessor</artifactId>
    </dependency>
    <dependency>
        <groupId>org.apache.ctakes</groupId>
        <artifactId>ctakes-ne-contexts</artifactId>
    </dependency>
    <dependency>
        <groupId>org.apache.ctakes</groupId>
        <artifactId>ctakes-assertion</artifactId>
    </dependency>
    <dependency>
        <groupId>org.apache.ctakes</groupId>
        <artifactId>ctakes-clinical-pipeline-res</artifactId>
    </dependency>
</dependencies>

A lot shorter, right?  Those dependencies include everything that you really need to run the default clinical pipeline, which is basically what you have for your listed pipeline.

In order to keep your temporal and relation pipelines working, you'll need to add the following (basically a leaner temporal pom.xml segment):
<dependencies>
    <dependency>
        <groupId>org.apache.ctakes</groupId>
        <artifactId>ctakes-temporal-res</artifactId>
    </dependency>
    <dependency>
        <groupId>org.apache.ctakes</groupId>
        <artifactId>ctakes-relation-extractor</artifactId>
    </dependency>
    <dependency>
        <groupId>org.cleartk</groupId>
        <artifactId>cleartk-timeml</artifactId>
    </dependency>
    <dependency>
        <groupId>org.cleartk</groupId>
        <artifactId>cleartk-ml-svmlight</artifactId>
    </dependency>
    <dependency>
        <groupId>org.cleartk</groupId>
        <artifactId>cleartk-ml-tksvmlight</artifactId>
    </dependency>
    <dependency>
        <groupId>org.cleartk</groupId>
        <artifactId>cleartk-type-system</artifactId>
    </dependency>
    <dependency>
        <groupId>org.cleartk</groupId>
        <artifactId>cleartk-ml-crfsuite</artifactId>
    </dependency>
    <dependency>
        <groupId>info.bethard</groupId>
        <artifactId>timenorm</artifactId>
        <version>0.9.5</version>
    </dependency>
    <dependency>
        <groupId>com.googlecode.java-diff-utils</groupId>
        <artifactId>diffutils</artifactId>
        <version>1.3.0</version>
    </dependency>
    <dependency>
        <groupId>org.cleartk</groupId>
        <artifactId>cleartk-ml-mallet</artifactId>
        <!-->version>2.0.1-SNAPSHOT</version-->
    </dependency>
    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-library</artifactId>
        <version>2.11.7</version>
    </dependency>
</dependencies>

It may take a couple of tries, but you could edit (comments) the main pom.xml file and exclude unwanted modules, then reimport to intellij and let it create a trimmer looking project for you.

Sean

-----Original Message-----
From: Erin Gustafson [mailto:erin.gustafson@northwestern.edu] 
Sent: Thursday, June 08, 2017 11:54 AM
To: dev@ctakes.apache.org
Subject: RE: Pipeline executable

Hi Sean,

I'm building the jar within IntelliJ IDEA. Project Structure>Artifacts>New artifact>Jar>From modules with dependencies... I've made one of my pipeline classes the main class, am extracting to the target JAR, and put the manifest file in my resources directory.

Here's what my pipeline looks like, to give you a sense of my goals:

CollectionReaderDescription collectionReader = CollectionReaderFactory.createReaderDescriptionFromPath("C:/Users/eng148/Documents/GitHub/cTAKESPipelines/src/resources/desc/FilesInDirectoryCollectionReader.xml",
                ConfigParameterConstants.PARAM_INPUTDIR,
                inputDir);

        AggregateBuilder aggregateBuilder = new AggregateBuilder();
        aggregateBuilder.add(SimpleSegmentAnnotator.createAnnotatorDescription());
        aggregateBuilder.add(SentenceDetector.createAnnotatorDescription());
        aggregateBuilder.add(TokenizerAnnotatorPTB.createAnnotatorDescription());
//        aggregateBuilder.add(LvgAnnotator.createAnnotatorDescription()); //URI not hierarchical error
        aggregateBuilder.add(ContextDependentTokenizerAnnotator.createAnnotatorDescription());
        aggregateBuilder.add(POSTagger.createAnnotatorDescription());
        aggregateBuilder.add(Chunker.createAnnotatorDescription("C:/Users/eng148/Documents/GitHub/cTAKESPipelines/src/resources/model/chunker-model.zip"));
        aggregateBuilder.add(ChunkAdjuster.createAnnotatorDescription(new String[] {"NP", "NP"}, 1));
        aggregateBuilder.add(ChunkAdjuster.createAnnotatorDescription(new String[] {"NP", "PP", "NP"}, 2));
        aggregateBuilder.add(DefaultJCasTermAnnotator.createAnnotatorDescription("C:/Users/eng148/Documents/GitHub/cTAKESPipelines/src/resources/desc/BsvDictionaryAD.xml"));
        aggregateBuilder.add(ClearNLPDependencyParserAE.createAnnotatorDescription());
        aggregateBuilder.add(AnalysisEngineFactory.createEngineDescription(ContextAnnotator.class)); // negation
        aggregateBuilder.add(AnalysisEngineFactory.createEngineDescription(ContextAnnotator.class, // status
                ContextAnnotator.MAX_LEFT_SCOPE_SIZE_PARAM, 10,
                ContextAnnotator.MAX_RIGHT_SCOPE_SIZE_PARAM, 10,
                "ContextAnalyzerClass", "org.apache.ctakes.necontexts.status.StatusContextAnalyzer",
                "ContextHitConsumerClass", "org.apache.ctakes.necontexts.status.StatusContextHitConsumer"));
        aggregateBuilder.add(SubjectCleartkAnalysisEngine.createAnnotatorDescription());

I'm using gradle, with the following dependencies included:
    compile 'org.apache.ctakes:ctakes-type-system:4.0.0'
    compile 'org.apache.ctakes:ctakes-clinical-pipeline:4.0.0'
    compile 'org.apache.ctakes:ctakes-core:4.0.0'
    compile 'org.apache.ctakes:ctakes:4.0.0'

For the pipeline I'm worried about making portable right now, I do not need relations, temporal information, or coreferences. But there are other pipelines within the project for doing location relation and temporal relation extraction. I'm still learning Java and how to use tools like gradle/maven, so it's definitely possible that I don't need all of those dependencies listed above. I was just erring on the side of getting it to work!

Erin


-----Original Message-----
From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu] 
Sent: Thursday, June 08, 2017 10:39 AM
To: dev@ctakes.apache.org
Subject: RE: Pipeline executable

Before I dig into the error and all enigmas uima, how are you building the jar?  Also, what do you need from ctakes?  If you do not need the higher functions for relations, temporal information, coreferences ... or the sideline items like smoking status, drug-ner, ytex (a big one) ... then you can probably create a jar that is about half that size just by getting rid of their libraries and dependencies.

Sean

-----Original Message-----
From: Erin Gustafson [mailto:erin.gustafson@northwestern.edu] 
Sent: Thursday, June 08, 2017 11:33 AM
To: dev@ctakes.apache.org
Subject: RE: Pipeline executable

Within org.apache.ctakes.typesystem itself there are no classes, but in org.apache.ctakes.typesystem.type.textspan I do see Segment.class.

The jar is indeed huge (1.14 GB). Open to any suggestions for the most efficient way to go about this!

Erin


-----Original Message-----
From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu] 
Sent: Thursday, June 08, 2017 10:24 AM
To: dev@ctakes.apache.org
Subject: RE: Pipeline executable

Hi Erin,

Do you have any classes in ctakes-typesystem (org.apache.ctakes.type.system)?  It could be that jcasgen needs to be run.

Just out of curiosity, how huge is your jar file?  You may be able to decrease the size ...

Sean

-----Original Message-----
From: Erin Gustafson [mailto:erin.gustafson@northwestern.edu] 
Sent: Thursday, June 08, 2017 11:16 AM
To: dev@ctakes.apache.org
Subject: Pipeline executable

Hi all,

I have a project that contains a series of classes to build cTAKES pipelines. I've been successfully running the pipelines myself within an IDE, but would like to be able to provide collaborators with an executable jar file to run our pipeline.

So far, I've managed to build a jar that will start running the pipeline from the command line. It successfully initializes the annotators but throws an exception when processing begins:

Exception in thread "main" java.lang.IllegalStateException: org.apache.uima.resource.ResourceInitializationException: Undefined type "org.apache.ctakes.typesystem.type.textspan.Segment" in type priority list. (Descriptor: <unknown>)

Any thoughts about how to resolve this error? Let me know if I can provide any more information..

Thanks,
Erin


RE: Pipeline executable

Posted by Erin Gustafson <er...@northwestern.edu>.
Hi Sean,

I'm building the jar within IntelliJ IDEA. Project Structure>Artifacts>New artifact>Jar>From modules with dependencies... I've made one of my pipeline classes the main class, am extracting to the target JAR, and put the manifest file in my resources directory.

Here's what my pipeline looks like, to give you a sense of my goals:

CollectionReaderDescription collectionReader = CollectionReaderFactory.createReaderDescriptionFromPath("C:/Users/eng148/Documents/GitHub/cTAKESPipelines/src/resources/desc/FilesInDirectoryCollectionReader.xml",
                ConfigParameterConstants.PARAM_INPUTDIR,
                inputDir);

        AggregateBuilder aggregateBuilder = new AggregateBuilder();
        aggregateBuilder.add(SimpleSegmentAnnotator.createAnnotatorDescription());
        aggregateBuilder.add(SentenceDetector.createAnnotatorDescription());
        aggregateBuilder.add(TokenizerAnnotatorPTB.createAnnotatorDescription());
//        aggregateBuilder.add(LvgAnnotator.createAnnotatorDescription()); //URI not hierarchical error
        aggregateBuilder.add(ContextDependentTokenizerAnnotator.createAnnotatorDescription());
        aggregateBuilder.add(POSTagger.createAnnotatorDescription());
        aggregateBuilder.add(Chunker.createAnnotatorDescription("C:/Users/eng148/Documents/GitHub/cTAKESPipelines/src/resources/model/chunker-model.zip"));
        aggregateBuilder.add(ChunkAdjuster.createAnnotatorDescription(new String[] {"NP", "NP"}, 1));
        aggregateBuilder.add(ChunkAdjuster.createAnnotatorDescription(new String[] {"NP", "PP", "NP"}, 2));
        aggregateBuilder.add(DefaultJCasTermAnnotator.createAnnotatorDescription("C:/Users/eng148/Documents/GitHub/cTAKESPipelines/src/resources/desc/BsvDictionaryAD.xml"));
        aggregateBuilder.add(ClearNLPDependencyParserAE.createAnnotatorDescription());
        aggregateBuilder.add(AnalysisEngineFactory.createEngineDescription(ContextAnnotator.class)); // negation
        aggregateBuilder.add(AnalysisEngineFactory.createEngineDescription(ContextAnnotator.class, // status
                ContextAnnotator.MAX_LEFT_SCOPE_SIZE_PARAM, 10,
                ContextAnnotator.MAX_RIGHT_SCOPE_SIZE_PARAM, 10,
                "ContextAnalyzerClass", "org.apache.ctakes.necontexts.status.StatusContextAnalyzer",
                "ContextHitConsumerClass", "org.apache.ctakes.necontexts.status.StatusContextHitConsumer"));
        aggregateBuilder.add(SubjectCleartkAnalysisEngine.createAnnotatorDescription());

I'm using gradle, with the following dependencies included:
    compile 'org.apache.ctakes:ctakes-type-system:4.0.0'
    compile 'org.apache.ctakes:ctakes-clinical-pipeline:4.0.0'
    compile 'org.apache.ctakes:ctakes-core:4.0.0'
    compile 'org.apache.ctakes:ctakes:4.0.0'

For the pipeline I'm worried about making portable right now, I do not need relations, temporal information, or coreferences. But there are other pipelines within the project for doing location relation and temporal relation extraction. I'm still learning Java and how to use tools like gradle/maven, so it's definitely possible that I don't need all of those dependencies listed above. I was just erring on the side of getting it to work!

Erin


-----Original Message-----
From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu] 
Sent: Thursday, June 08, 2017 10:39 AM
To: dev@ctakes.apache.org
Subject: RE: Pipeline executable

Before I dig into the error and all enigmas uima, how are you building the jar?  Also, what do you need from ctakes?  If you do not need the higher functions for relations, temporal information, coreferences ... or the sideline items like smoking status, drug-ner, ytex (a big one) ... then you can probably create a jar that is about half that size just by getting rid of their libraries and dependencies.

Sean

-----Original Message-----
From: Erin Gustafson [mailto:erin.gustafson@northwestern.edu] 
Sent: Thursday, June 08, 2017 11:33 AM
To: dev@ctakes.apache.org
Subject: RE: Pipeline executable

Within org.apache.ctakes.typesystem itself there are no classes, but in org.apache.ctakes.typesystem.type.textspan I do see Segment.class.

The jar is indeed huge (1.14 GB). Open to any suggestions for the most efficient way to go about this!

Erin


-----Original Message-----
From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu] 
Sent: Thursday, June 08, 2017 10:24 AM
To: dev@ctakes.apache.org
Subject: RE: Pipeline executable

Hi Erin,

Do you have any classes in ctakes-typesystem (org.apache.ctakes.type.system)?  It could be that jcasgen needs to be run.

Just out of curiosity, how huge is your jar file?  You may be able to decrease the size ...

Sean

-----Original Message-----
From: Erin Gustafson [mailto:erin.gustafson@northwestern.edu] 
Sent: Thursday, June 08, 2017 11:16 AM
To: dev@ctakes.apache.org
Subject: Pipeline executable

Hi all,

I have a project that contains a series of classes to build cTAKES pipelines. I've been successfully running the pipelines myself within an IDE, but would like to be able to provide collaborators with an executable jar file to run our pipeline.

So far, I've managed to build a jar that will start running the pipeline from the command line. It successfully initializes the annotators but throws an exception when processing begins:

Exception in thread "main" java.lang.IllegalStateException: org.apache.uima.resource.ResourceInitializationException: Undefined type "org.apache.ctakes.typesystem.type.textspan.Segment" in type priority list. (Descriptor: <unknown>)

Any thoughts about how to resolve this error? Let me know if I can provide any more information..

Thanks,
Erin


RE: Pipeline executable

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Before I dig into the error and all enigmas uima, how are you building the jar?  Also, what do you need from ctakes?  If you do not need the higher functions for relations, temporal information, coreferences ... or the sideline items like smoking status, drug-ner, ytex (a big one) ... then you can probably create a jar that is about half that size just by getting rid of their libraries and dependencies.

Sean

-----Original Message-----
From: Erin Gustafson [mailto:erin.gustafson@northwestern.edu] 
Sent: Thursday, June 08, 2017 11:33 AM
To: dev@ctakes.apache.org
Subject: RE: Pipeline executable

Within org.apache.ctakes.typesystem itself there are no classes, but in org.apache.ctakes.typesystem.type.textspan I do see Segment.class.

The jar is indeed huge (1.14 GB). Open to any suggestions for the most efficient way to go about this!

Erin


-----Original Message-----
From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu] 
Sent: Thursday, June 08, 2017 10:24 AM
To: dev@ctakes.apache.org
Subject: RE: Pipeline executable

Hi Erin,

Do you have any classes in ctakes-typesystem (org.apache.ctakes.type.system)?  It could be that jcasgen needs to be run.

Just out of curiosity, how huge is your jar file?  You may be able to decrease the size ...

Sean

-----Original Message-----
From: Erin Gustafson [mailto:erin.gustafson@northwestern.edu] 
Sent: Thursday, June 08, 2017 11:16 AM
To: dev@ctakes.apache.org
Subject: Pipeline executable

Hi all,

I have a project that contains a series of classes to build cTAKES pipelines. I've been successfully running the pipelines myself within an IDE, but would like to be able to provide collaborators with an executable jar file to run our pipeline.

So far, I've managed to build a jar that will start running the pipeline from the command line. It successfully initializes the annotators but throws an exception when processing begins:

Exception in thread "main" java.lang.IllegalStateException: org.apache.uima.resource.ResourceInitializationException: Undefined type "org.apache.ctakes.typesystem.type.textspan.Segment" in type priority list. (Descriptor: <unknown>)

Any thoughts about how to resolve this error? Let me know if I can provide any more information..

Thanks,
Erin


RE: Pipeline executable

Posted by Erin Gustafson <er...@northwestern.edu>.
Within org.apache.ctakes.typesystem itself there are no classes, but in org.apache.ctakes.typesystem.type.textspan I do see Segment.class.

The jar is indeed huge (1.14 GB). Open to any suggestions for the most efficient way to go about this!

Erin


-----Original Message-----
From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu] 
Sent: Thursday, June 08, 2017 10:24 AM
To: dev@ctakes.apache.org
Subject: RE: Pipeline executable

Hi Erin,

Do you have any classes in ctakes-typesystem (org.apache.ctakes.type.system)?  It could be that jcasgen needs to be run.

Just out of curiosity, how huge is your jar file?  You may be able to decrease the size ...

Sean

-----Original Message-----
From: Erin Gustafson [mailto:erin.gustafson@northwestern.edu] 
Sent: Thursday, June 08, 2017 11:16 AM
To: dev@ctakes.apache.org
Subject: Pipeline executable

Hi all,

I have a project that contains a series of classes to build cTAKES pipelines. I've been successfully running the pipelines myself within an IDE, but would like to be able to provide collaborators with an executable jar file to run our pipeline.

So far, I've managed to build a jar that will start running the pipeline from the command line. It successfully initializes the annotators but throws an exception when processing begins:

Exception in thread "main" java.lang.IllegalStateException: org.apache.uima.resource.ResourceInitializationException: Undefined type "org.apache.ctakes.typesystem.type.textspan.Segment" in type priority list. (Descriptor: <unknown>)

Any thoughts about how to resolve this error? Let me know if I can provide any more information..

Thanks,
Erin


RE: Pipeline executable

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Erin,

Do you have any classes in ctakes-typesystem (org.apache.ctakes.type.system)?  It could be that jcasgen needs to be run.

Just out of curiosity, how huge is your jar file?  You may be able to decrease the size ...

Sean

-----Original Message-----
From: Erin Gustafson [mailto:erin.gustafson@northwestern.edu] 
Sent: Thursday, June 08, 2017 11:16 AM
To: dev@ctakes.apache.org
Subject: Pipeline executable

Hi all,

I have a project that contains a series of classes to build cTAKES pipelines. I've been successfully running the pipelines myself within an IDE, but would like to be able to provide collaborators with an executable jar file to run our pipeline.

So far, I've managed to build a jar that will start running the pipeline from the command line. It successfully initializes the annotators but throws an exception when processing begins:

Exception in thread "main" java.lang.IllegalStateException: org.apache.uima.resource.ResourceInitializationException: Undefined type "org.apache.ctakes.typesystem.type.textspan.Segment" in type priority list. (Descriptor: <unknown>)

Any thoughts about how to resolve this error? Let me know if I can provide any more information..

Thanks,
Erin