You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by "Abramowitsch, Peter" <pa...@hearst.com> on 2016/07/13 18:32:58 UTC
Help needed with document creation time/date
Hello All
How can I get Ctakes to deduce the document creation datetime from the text. I have a pipeline including the following engines
Basic Token Processing
FastUMLS
Zoner
ClearNLPDependencyParserAE
PolarityCleartkAnalysisEngine
UncertaintyCleartkAnalysisEngine
HistoryCleartkAnalysisEngine
ConditionalCleartkAnalysisEngine
GenericCleartkAnalysisEngine
SubjectCleartkAnalysisEngine
EventAnnotator
AnalysisEngineFactory.createEngineDescription(CopyPropertiesToTemporalEventAnnotator.class)
DocTimeRelAnnotator
BackwardsTimeAnnotator
EventTimeRelationAnnotator
EventEventRelationAnnotator
I see that there is a DocumentCreationTime type, but it seems to be initialized from inside one of the ClearTKAnnotators.
I cannot find any documentation and don't know if it is looking for particular manifestations in the text or whether a property needs to be set externally on the JCAS or one of the SOFAs.
Any help out there? Examples?
Many thanks,
Peter
RE: Help needed with document creation time/date
Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Sounds cool!
-----Original Message-----
From: Abramowitsch, Peter [mailto:pabramowitsch@hearst.com]
Sent: Wednesday, July 13, 2016 3:28 PM
To: dev@ctakes.apache.org
Subject: Re: Help needed with document creation time/date
Thank you!
Knowing now that it's a do-it-yourself thing, I might also try adding those expressions to a RegexAnnotator that I'm using: It allows the expressions to be added and combined in an external file.
https://urldefense.proofpoint.com/v2/url?u=https-3A__logiciels.lina.univ-2Dnantes.fr_redmine_..._uima-2Dtokens-2Dregex-2Ddocumen&d=CwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=gtRszD1mKPIm3Erb5z6m8N5doeQFq0CbaYinv_UDjig&s=nUEZxcNd3pf677x1aog5r67AOMqXBRu2BrLNSOJJ4Hg&e=
tation.pdf
It took me a while to get it working but it allows a kind of meta regex against strings but also against the value of attributes such as POS.
Just like Stanford's TokensRegex
- Peter
On 7/13/16, 12:19 PM, "Finan, Sean" <Se...@childrens.harvard.edu>
wrote:
>Pattern.compile( ".*Principal Date\\D+(\\d+) (\\d+).*", DOTALL );
Re: Help needed with document creation time/date
Posted by "Abramowitsch, Peter" <pa...@hearst.com>.
Thank you!
Knowing now that it's a do-it-yourself thing, I might also try adding
those expressions to a RegexAnnotator that I'm using: It allows the
expressions to be added and combined in an external file.
https://logiciels.lina.univ-nantes.fr/redmine/.../uima-tokens-regex-documen
tation.pdf
It took me a while to get it working but it allows a kind of meta regex
against strings but also against the value of attributes such as POS.
Just like Stanford's TokensRegex
- Peter
On 7/13/16, 12:19 PM, "Finan, Sean" <Se...@childrens.harvard.edu>
wrote:
>Pattern.compile( ".*Principal Date\\D+(\\d+) (\\d+).*", DOTALL );
RE: Help needed with document creation time/date
Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Just to be clear, DATE_PATTERN is whatever regex you use. For instance:
static private final Pattern DATE_PATTERN = Pattern.compile( ".*Principal Date\\D+(\\d+) (\\d+).*", DOTALL );
-----Original Message-----
From: Abramowitsch, Peter [mailto:pabramowitsch@hearst.com]
Sent: Wednesday, July 13, 2016 3:04 PM
To: dev@ctakes.apache.org
Subject: Re: Help needed with document creation time/date
Got it. Thanks
On 7/13/16, 12:00 PM, "Finan, Sean" <Se...@childrens.harvard.edu>
wrote:
>DATE_PATTERN.matcher
Re: Help needed with document creation time/date
Posted by "Abramowitsch, Peter" <pa...@hearst.com>.
Got it. Thanks
On 7/13/16, 12:00 PM, "Finan, Sean" <Se...@childrens.harvard.edu>
wrote:
>DATE_PATTERN.matcher
RE: Help needed with document creation time/date
Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Basically, you just want to create a TimeMention.
Here is a short example:
final String docText = jcas.getDocumentText();
final Matcher dateMatcher = DATE_PATTERN.matcher( docText );
if ( dateMatcher.matches() ) {
final TimeMention docTime = new TimeMention( jcas );
docTime.setBegin( dateMatcher.start( 1 ) );
docTime.setEnd( dateMatcher.end( 2 ) );
docTime.setId( 0 );
docTime.addToIndexes();
}
If you do want to use the org.cleartk.timeml.type.DocumentCreationTime class then you can do so. For later fetching and use, with a TimeMention you'll rely on the class type and id while on the DocumentCreationTime you can just use the class type.
Sean
-----Original Message-----
From: Abramowitsch, Peter [mailto:pabramowitsch@hearst.com]
Sent: Wednesday, July 13, 2016 2:47 PM
To: dev@ctakes.apache.org
Subject: Re: Help needed with document creation time/date
Thanks Sean. Great advice.
I have a regexNER, but didn't go that route because it looked as if there was an inbuilt mechanism waiting to be activated.
Say I know the time from some external source, is there a kosher way I can inject it into the CAS as a creation time property so that it can be retrieved later by a client that knows only the serialized CAS?
Peter
On 7/13/16, 11:41 AM, "Finan, Sean" <Se...@childrens.harvard.edu>
wrote:
>Hi Peter,
>
>Our group has used two different approaches, depending upon the note type:
>1. Use a custom AE that creates creation time based upon a regex.
>This works well for notes that have a header or footer with a known format.
>2. Use the last normalized temporal expression. For my test notes
>this worked more frequently than you would think (~90%), but I would
>not go this route unless you have thoroughly thought about what is in
>your notes and how you are going to use the document creation time.
>
>That is all that we've done with respect to getting the creation time
>from the actual text. If you have any kind of structured data tied to
>the note that indicates date, then you can tie things (e.g. doctimerel,
>doctime) together post-process. We are doing this in one project.
>
>Sean
>
>-----Original Message-----
>From: Abramowitsch, Peter [mailto:pabramowitsch@hearst.com]
>Sent: Wednesday, July 13, 2016 2:33 PM
>To: dev@ctakes.apache.org
>Subject: Help needed with document creation time/date
>
>Hello All
>
>How can I get Ctakes to deduce the document creation datetime from the
>text. I have a pipeline including the following engines Basic Token
>Processing FastUMLS
>
>Zoner
>
>ClearNLPDependencyParserAE
>
>PolarityCleartkAnalysisEngine
>
>UncertaintyCleartkAnalysisEngine
>
>HistoryCleartkAnalysisEngine
>
>ConditionalCleartkAnalysisEngine
>
>GenericCleartkAnalysisEngine
>
>SubjectCleartkAnalysisEngine
>
>EventAnnotator
>
>AnalysisEngineFactory.createEngineDescription(CopyPropertiesToTemporalE
>ven
>tAnnotator.class)
>
>DocTimeRelAnnotator
>
>BackwardsTimeAnnotator
>
>EventTimeRelationAnnotator
>
>EventEventRelationAnnotator
>
>
>I see that there is a DocumentCreationTime type, but it seems to be
>initialized from inside one of the ClearTKAnnotators.
>
>I cannot find any documentation and don't know if it is looking for
>particular manifestations in the text or whether a property needs to be
>set externally on the JCAS or one of the SOFAs.
>
>
>Any help out there? Examples?
>
>
>Many thanks,
>
>Peter
Re: Help needed with document creation time/date
Posted by "Abramowitsch, Peter" <pa...@hearst.com>.
Thanks Sean. Great advice.
I have a regexNER, but didn't go that route because it looked as if there
was an inbuilt mechanism waiting to be activated.
Say I know the time from some external source, is there a kosher way I can
inject it into the CAS as a creation time property so that it can be
retrieved later by a client that knows only the serialized CAS?
Peter
On 7/13/16, 11:41 AM, "Finan, Sean" <Se...@childrens.harvard.edu>
wrote:
>Hi Peter,
>
>Our group has used two different approaches, depending upon the note type:
>1. Use a custom AE that creates creation time based upon a regex. This
>works well for notes that have a header or footer with a known format.
>2. Use the last normalized temporal expression. For my test notes this
>worked more frequently than you would think (~90%), but I would not go
>this route unless you have thoroughly thought about what is in your notes
>and how you are going to use the document creation time.
>
>That is all that we've done with respect to getting the creation time
>from the actual text. If you have any kind of structured data tied to
>the note that indicates date, then you can tie things (e.g. doctimerel,
>doctime) together post-process. We are doing this in one project.
>
>Sean
>
>-----Original Message-----
>From: Abramowitsch, Peter [mailto:pabramowitsch@hearst.com]
>Sent: Wednesday, July 13, 2016 2:33 PM
>To: dev@ctakes.apache.org
>Subject: Help needed with document creation time/date
>
>Hello All
>
>How can I get Ctakes to deduce the document creation datetime from the
>text. I have a pipeline including the following engines Basic Token
>Processing FastUMLS
>
>Zoner
>
>ClearNLPDependencyParserAE
>
>PolarityCleartkAnalysisEngine
>
>UncertaintyCleartkAnalysisEngine
>
>HistoryCleartkAnalysisEngine
>
>ConditionalCleartkAnalysisEngine
>
>GenericCleartkAnalysisEngine
>
>SubjectCleartkAnalysisEngine
>
>EventAnnotator
>
>AnalysisEngineFactory.createEngineDescription(CopyPropertiesToTemporalEven
>tAnnotator.class)
>
>DocTimeRelAnnotator
>
>BackwardsTimeAnnotator
>
>EventTimeRelationAnnotator
>
>EventEventRelationAnnotator
>
>
>I see that there is a DocumentCreationTime type, but it seems to be
>initialized from inside one of the ClearTKAnnotators.
>
>I cannot find any documentation and don't know if it is looking for
>particular manifestations in the text or whether a property needs to be
>set externally on the JCAS or one of the SOFAs.
>
>
>Any help out there? Examples?
>
>
>Many thanks,
>
>Peter
RE: Help needed with document creation time/date
Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Peter,
Our group has used two different approaches, depending upon the note type:
1. Use a custom AE that creates creation time based upon a regex. This works well for notes that have a header or footer with a known format.
2. Use the last normalized temporal expression. For my test notes this worked more frequently than you would think (~90%), but I would not go this route unless you have thoroughly thought about what is in your notes and how you are going to use the document creation time.
That is all that we've done with respect to getting the creation time from the actual text. If you have any kind of structured data tied to the note that indicates date, then you can tie things (e.g. doctimerel, doctime) together post-process. We are doing this in one project.
Sean
-----Original Message-----
From: Abramowitsch, Peter [mailto:pabramowitsch@hearst.com]
Sent: Wednesday, July 13, 2016 2:33 PM
To: dev@ctakes.apache.org
Subject: Help needed with document creation time/date
Hello All
How can I get Ctakes to deduce the document creation datetime from the text. I have a pipeline including the following engines Basic Token Processing FastUMLS
Zoner
ClearNLPDependencyParserAE
PolarityCleartkAnalysisEngine
UncertaintyCleartkAnalysisEngine
HistoryCleartkAnalysisEngine
ConditionalCleartkAnalysisEngine
GenericCleartkAnalysisEngine
SubjectCleartkAnalysisEngine
EventAnnotator
AnalysisEngineFactory.createEngineDescription(CopyPropertiesToTemporalEventAnnotator.class)
DocTimeRelAnnotator
BackwardsTimeAnnotator
EventTimeRelationAnnotator
EventEventRelationAnnotator
I see that there is a DocumentCreationTime type, but it seems to be initialized from inside one of the ClearTKAnnotators.
I cannot find any documentation and don't know if it is looking for particular manifestations in the text or whether a property needs to be set externally on the JCAS or one of the SOFAs.
Any help out there? Examples?
Many thanks,
Peter