You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by "Chen, Pei" <Pe...@childrens.harvard.edu> on 2013/06/11 19:49:58 UTC

InputSteam instead of java.io.File

While working on the test cases in cTAKES, I've encountered couple of issues and suggestions:

1)      File or Url.getRawPath() became problematic if they are read in from the jars from the classpath and which couldn't resolve to a physical File.

a.       Suggestion: Wherever possible, replace loading of resouces via java.io.File with InputStream instead.  . We can add a new method in the FileLocator util and deprecate the old File method.

2)      Sentence Dectector is still using the OpenNLP 1.4 mechanism of loading it's model file.

a.       Suggestion: Let's update it to use the new 1.5 way similar to POSTagger.  (Remove non longer required classes: SuffixMaxentModelResourceImpl, MaxentModelResource, SuffixSensitiveGISModelReader, classes etc.)

Background:
Certain unit tests fail because they can't be resolved via jars from the classpath because the code is explicitly looking for File on disk instead of input stream.  But in order to solve it appropriately, it had a cascading effect and required a lot more changes, but it's probably a good time to update those projects anyhow.

--Pei


RE: InputSteam instead of java.io.File

Posted by William Karl Thompson <wk...@northwestern.edu>.
Hi Pei,

Great, I'll create a Jira item for the resource loading issue, and will also assemble a Groovy DSL project for the sandbox.

Cheers,

Will

-----Original Message-----
From: Chen, Pei [mailto:Pei.Chen@childrens.harvard.edu] 
Sent: Tuesday, June 11, 2013 2:06 PM
To: dev@ctakes.apache.org
Subject: RE: InputSteam instead of java.io.File

Hi Will,
Yes, this would be very interesting and a good place for the sandbox.
To get started, feel free to create a Jira item for this, and if you like- attach the code to it.

Surprised you weren't a committer already :-).
--Pei

> -----Original Message-----
> From: William Karl Thompson [mailto:wkt@northwestern.edu]
> Sent: Tuesday, June 11, 2013 2:35 PM
> To: dev@ctakes.apache.org
> Subject: RE: InputSteam instead of java.io.File
> 
> Hi Pei,
> 
> On another note, I've been working on a Groovy-based domain specific 
> language (DSL) for UIMA/cTAKES that I think has some nice features 
> that would be of interest to the community. It allows for quick 
> development of compact and powerful rule-based annotators. As I'm 
> working on this, I'm integrating it with the cTAKES type system. Is 
> there a cTAKES sandbox repository for such projects?
> 
> Thanks!
> 
> Will
> 
> -----Original Message-----
> From: Chen, Pei [mailto:Pei.Chen@childrens.harvard.edu]
> Sent: Tuesday, June 11, 2013 12:50 PM
> To: dev@ctakes.apache.org
> Subject: InputSteam instead of java.io.File
> 
> While working on the test cases in cTAKES, I've encountered couple of 
> issues and suggestions:
> 
> 1)      File or Url.getRawPath() became problematic if they are read in from
> the jars from the classpath and which couldn't resolve to a physical File.
> 
> a.       Suggestion: Wherever possible, replace loading of resouces via
> java.io.File with InputStream instead.  . We can add a new method in 
> the FileLocator util and deprecate the old File method.
> 
> 2)      Sentence Dectector is still using the OpenNLP 1.4 mechanism of loading
> it's model file.
> 
> a.       Suggestion: Let's update it to use the new 1.5 way similar to POSTagger.
> (Remove non longer required classes: SuffixMaxentModelResourceImpl, 
> MaxentModelResource, SuffixSensitiveGISModelReader, classes etc.)
> 
> Background:
> Certain unit tests fail because they can't be resolved via jars from 
> the classpath because the code is explicitly looking for File on disk 
> instead of input stream.  But in order to solve it appropriately, it 
> had a cascading effect and required a lot more changes, but it's 
> probably a good time to update those projects anyhow.
> 
> --Pei


RE: InputSteam instead of java.io.File

Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.
Hi Will,
Yes, this would be very interesting and a good place for the sandbox.
To get started, feel free to create a Jira item for this, and if you like- attach the code to it.

Surprised you weren't a committer already :-).
--Pei

> -----Original Message-----
> From: William Karl Thompson [mailto:wkt@northwestern.edu]
> Sent: Tuesday, June 11, 2013 2:35 PM
> To: dev@ctakes.apache.org
> Subject: RE: InputSteam instead of java.io.File
> 
> Hi Pei,
> 
> On another note, I've been working on a Groovy-based domain specific
> language (DSL) for UIMA/cTAKES that I think has some nice features that
> would be of interest to the community. It allows for quick development of
> compact and powerful rule-based annotators. As I'm working on this, I'm
> integrating it with the cTAKES type system. Is there a cTAKES sandbox
> repository for such projects?
> 
> Thanks!
> 
> Will
> 
> -----Original Message-----
> From: Chen, Pei [mailto:Pei.Chen@childrens.harvard.edu]
> Sent: Tuesday, June 11, 2013 12:50 PM
> To: dev@ctakes.apache.org
> Subject: InputSteam instead of java.io.File
> 
> While working on the test cases in cTAKES, I've encountered couple of issues
> and suggestions:
> 
> 1)      File or Url.getRawPath() became problematic if they are read in from
> the jars from the classpath and which couldn't resolve to a physical File.
> 
> a.       Suggestion: Wherever possible, replace loading of resouces via
> java.io.File with InputStream instead.  . We can add a new method in the
> FileLocator util and deprecate the old File method.
> 
> 2)      Sentence Dectector is still using the OpenNLP 1.4 mechanism of loading
> it's model file.
> 
> a.       Suggestion: Let's update it to use the new 1.5 way similar to POSTagger.
> (Remove non longer required classes: SuffixMaxentModelResourceImpl,
> MaxentModelResource, SuffixSensitiveGISModelReader, classes etc.)
> 
> Background:
> Certain unit tests fail because they can't be resolved via jars from the
> classpath because the code is explicitly looking for File on disk instead of
> input stream.  But in order to solve it appropriately, it had a cascading effect
> and required a lot more changes, but it's probably a good time to update
> those projects anyhow.
> 
> --Pei


RE: InputSteam instead of java.io.File

Posted by William Karl Thompson <wk...@northwestern.edu>.
Hi Pei,

On another note, I've been working on a Groovy-based domain specific language (DSL) for UIMA/cTAKES that I think has some nice features that would be of interest to the community. It allows for quick development of compact and powerful rule-based annotators. As I'm working on this, I'm integrating it with the cTAKES type system. Is there a cTAKES sandbox repository for such projects?

Thanks!

Will

-----Original Message-----
From: Chen, Pei [mailto:Pei.Chen@childrens.harvard.edu] 
Sent: Tuesday, June 11, 2013 12:50 PM
To: dev@ctakes.apache.org
Subject: InputSteam instead of java.io.File

While working on the test cases in cTAKES, I've encountered couple of issues and suggestions:

1)      File or Url.getRawPath() became problematic if they are read in from the jars from the classpath and which couldn't resolve to a physical File.

a.       Suggestion: Wherever possible, replace loading of resouces via java.io.File with InputStream instead.  . We can add a new method in the FileLocator util and deprecate the old File method.

2)      Sentence Dectector is still using the OpenNLP 1.4 mechanism of loading it's model file.

a.       Suggestion: Let's update it to use the new 1.5 way similar to POSTagger.  (Remove non longer required classes: SuffixMaxentModelResourceImpl, MaxentModelResource, SuffixSensitiveGISModelReader, classes etc.)

Background:
Certain unit tests fail because they can't be resolved via jars from the classpath because the code is explicitly looking for File on disk instead of input stream.  But in order to solve it appropriately, it had a cascading effect and required a lot more changes, but it's probably a good time to update those projects anyhow.

--Pei


RE: InputSteam instead of java.io.File

Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.
I believe in OpenNLP 1.5 and above, the model metadata is part of the mode/zip file now, so I was thinking of making it even simpler:
It should be as simple as:
SentenceModel model = new SentenceModel(InputStream);

--Pei
> -----Original Message-----
> From: William Karl Thompson [mailto:wkt@northwestern.edu]
> Sent: Tuesday, June 11, 2013 2:08 PM
> To: dev@ctakes.apache.org
> Subject: RE: InputSteam instead of java.io.File
> 
> Issue (1) is something I've encountered too, in the
> SuffixMaxentModelResourceImpl class. There is a call to
> DataResource.getUrl() which doesn't work if the resource is located in a jar
> file. Replacing this with the following code (starting on line 55) fixed the
> problem:
> 
>                 //File modelFile = new File(dr.getUri());
>        	InputStream is = dr.getInputStream();
>         	DataReader dataReader = new PlainTextFileDataReader(is);
>         	GISModelReader modelReader = new
> GISModelReader(dataReader);
>             	iv_maxentModel = modelReader.getModel();
>             	is.close();
> 
> 
> -----Original Message-----
> From: Chen, Pei [mailto:Pei.Chen@childrens.harvard.edu]
> Sent: Tuesday, June 11, 2013 12:50 PM
> To: dev@ctakes.apache.org
> Subject: InputSteam instead of java.io.File
> 
> While working on the test cases in cTAKES, I've encountered couple of issues
> and suggestions:
> 
> 1)      File or Url.getRawPath() became problematic if they are read in from
> the jars from the classpath and which couldn't resolve to a physical File.
> 
> a.       Suggestion: Wherever possible, replace loading of resouces via
> java.io.File with InputStream instead.  . We can add a new method in the
> FileLocator util and deprecate the old File method.
> 
> 2)      Sentence Dectector is still using the OpenNLP 1.4 mechanism of loading
> it's model file.
> 
> a.       Suggestion: Let's update it to use the new 1.5 way similar to POSTagger.
> (Remove non longer required classes: SuffixMaxentModelResourceImpl,
> MaxentModelResource, SuffixSensitiveGISModelReader, classes etc.)
> 
> Background:
> Certain unit tests fail because they can't be resolved via jars from the
> classpath because the code is explicitly looking for File on disk instead of
> input stream.  But in order to solve it appropriately, it had a cascading effect
> and required a lot more changes, but it's probably a good time to update
> those projects anyhow.
> 
> --Pei


RE: InputSteam instead of java.io.File

Posted by William Karl Thompson <wk...@northwestern.edu>.
Issue (1) is something I've encountered too, in the SuffixMaxentModelResourceImpl class. There is a call to DataResource.getUrl() which doesn't work if the resource is located in a jar file. Replacing this with the following code (starting on line 55) fixed the problem:

                //File modelFile = new File(dr.getUri());
       	InputStream is = dr.getInputStream();
        	DataReader dataReader = new PlainTextFileDataReader(is);
        	GISModelReader modelReader = new GISModelReader(dataReader);
            	iv_maxentModel = modelReader.getModel();
            	is.close();


-----Original Message-----
From: Chen, Pei [mailto:Pei.Chen@childrens.harvard.edu] 
Sent: Tuesday, June 11, 2013 12:50 PM
To: dev@ctakes.apache.org
Subject: InputSteam instead of java.io.File

While working on the test cases in cTAKES, I've encountered couple of issues and suggestions:

1)      File or Url.getRawPath() became problematic if they are read in from the jars from the classpath and which couldn't resolve to a physical File.

a.       Suggestion: Wherever possible, replace loading of resouces via java.io.File with InputStream instead.  . We can add a new method in the FileLocator util and deprecate the old File method.

2)      Sentence Dectector is still using the OpenNLP 1.4 mechanism of loading it's model file.

a.       Suggestion: Let's update it to use the new 1.5 way similar to POSTagger.  (Remove non longer required classes: SuffixMaxentModelResourceImpl, MaxentModelResource, SuffixSensitiveGISModelReader, classes etc.)

Background:
Certain unit tests fail because they can't be resolved via jars from the classpath because the code is explicitly looking for File on disk instead of input stream.  But in order to solve it appropriately, it had a cascading effect and required a lot more changes, but it's probably a good time to update those projects anyhow.

--Pei