You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by "Abramowitsch, Peter" <pa...@hearst.com> on 2016/07/13 18:32:58 UTC

Help needed with document creation time/date

Hello All

How can I get Ctakes to deduce the document creation datetime from the text.  I have a pipeline including the following engines
Basic Token Processing
FastUMLS

Zoner

ClearNLPDependencyParserAE

PolarityCleartkAnalysisEngine

UncertaintyCleartkAnalysisEngine

HistoryCleartkAnalysisEngine

ConditionalCleartkAnalysisEngine

GenericCleartkAnalysisEngine

SubjectCleartkAnalysisEngine

EventAnnotator

AnalysisEngineFactory.createEngineDescription(CopyPropertiesToTemporalEventAnnotator.class)

DocTimeRelAnnotator

BackwardsTimeAnnotator

EventTimeRelationAnnotator

EventEventRelationAnnotator


I see that there is a DocumentCreationTime type, but it seems to be initialized from inside one of the ClearTKAnnotators.

I cannot find any documentation and don't know if it is looking for particular manifestations in the text or whether a property needs to be set externally on the JCAS or one of the SOFAs.


Any help out there? Examples?


Many thanks,

Peter

RE: Help needed with document creation time/date

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Sounds cool!

-----Original Message-----
From: Abramowitsch, Peter [mailto:pabramowitsch@hearst.com] 
Sent: Wednesday, July 13, 2016 3:28 PM
To: dev@ctakes.apache.org
Subject: Re: Help needed with document creation time/date

Thank you!

Knowing now that it's a do-it-yourself thing, I might also try adding those expressions to a RegexAnnotator that I'm using:  It allows the expressions to be added and combined in an external file.

https://urldefense.proofpoint.com/v2/url?u=https-3A__logiciels.lina.univ-2Dnantes.fr_redmine_..._uima-2Dtokens-2Dregex-2Ddocumen&d=CwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=gtRszD1mKPIm3Erb5z6m8N5doeQFq0CbaYinv_UDjig&s=nUEZxcNd3pf677x1aog5r67AOMqXBRu2BrLNSOJJ4Hg&e=
tation.pdf

It took me a while to get it working but it allows a kind of meta regex against strings but also against the value of attributes such as POS.
Just like Stanford's TokensRegex

- Peter



On 7/13/16, 12:19 PM, "Finan, Sean" <Se...@childrens.harvard.edu>
wrote:

>Pattern.compile( ".*Principal Date\\D+(\\d+) (\\d+).*", DOTALL );


Re: Help needed with document creation time/date

Posted by "Abramowitsch, Peter" <pa...@hearst.com>.
Thank you!

Knowing now that it's a do-it-yourself thing, I might also try adding
those expressions to a RegexAnnotator that I'm using:  It allows the
expressions to be added and combined in an external file.

https://logiciels.lina.univ-nantes.fr/redmine/.../uima-tokens-regex-documen
tation.pdf

It took me a while to get it working but it allows a kind of meta regex
against strings but also against the value of attributes such as POS.
Just like Stanford's TokensRegex

- Peter



On 7/13/16, 12:19 PM, "Finan, Sean" <Se...@childrens.harvard.edu>
wrote:

>Pattern.compile( ".*Principal Date\\D+(\\d+) (\\d+).*", DOTALL );


RE: Help needed with document creation time/date

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Just to be clear, DATE_PATTERN is whatever regex you use.  For instance:

   static private final Pattern DATE_PATTERN = Pattern.compile( ".*Principal Date\\D+(\\d+) (\\d+).*", DOTALL );


-----Original Message-----
From: Abramowitsch, Peter [mailto:pabramowitsch@hearst.com] 
Sent: Wednesday, July 13, 2016 3:04 PM
To: dev@ctakes.apache.org
Subject: Re: Help needed with document creation time/date

Got it.  Thanks

On 7/13/16, 12:00 PM, "Finan, Sean" <Se...@childrens.harvard.edu>
wrote:

>DATE_PATTERN.matcher


Re: Help needed with document creation time/date

Posted by "Abramowitsch, Peter" <pa...@hearst.com>.
Got it.  Thanks

On 7/13/16, 12:00 PM, "Finan, Sean" <Se...@childrens.harvard.edu>
wrote:

>DATE_PATTERN.matcher


RE: Help needed with document creation time/date

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Basically, you just want to create a TimeMention.

Here is a short example:

      final String docText = jcas.getDocumentText();
      final Matcher dateMatcher = DATE_PATTERN.matcher( docText );
      if ( dateMatcher.matches() ) {
         final TimeMention docTime = new TimeMention( jcas );
         docTime.setBegin( dateMatcher.start( 1 ) );
         docTime.setEnd( dateMatcher.end( 2 ) );
         docTime.setId( 0 );
         docTime.addToIndexes();
      }

If you do want to use the org.cleartk.timeml.type.DocumentCreationTime class then you can do so.  For later fetching and use, with a TimeMention you'll rely on the class type and id while on the DocumentCreationTime you can just use the class type.  

Sean

-----Original Message-----
From: Abramowitsch, Peter [mailto:pabramowitsch@hearst.com] 
Sent: Wednesday, July 13, 2016 2:47 PM
To: dev@ctakes.apache.org
Subject: Re: Help needed with document creation time/date

Thanks Sean.  Great advice.

I have a regexNER, but didn't go that route because it looked as if there was an inbuilt mechanism waiting to be activated.
Say I know the time from some external source, is there a kosher way I can inject it into the CAS as a creation time property so that it can be retrieved later by a client that knows only the serialized CAS?

Peter

On 7/13/16, 11:41 AM, "Finan, Sean" <Se...@childrens.harvard.edu>
wrote:

>Hi Peter,
>
>Our group has used two different approaches, depending upon the note type:
>1.  Use a custom AE that creates creation time based upon a regex.  
>This works well for notes that have a header or footer with a known format.
>2.  Use the last normalized temporal expression.  For my test notes 
>this worked more frequently than you would think (~90%), but I would 
>not go this route unless you have thoroughly thought about what is in 
>your notes and how you are going to use the document creation time.
>
>That is all that we've done with respect to getting the creation time 
>from the actual text.  If you have any kind of structured data tied to 
>the note that indicates date, then you can tie things (e.g. doctimerel,
>doctime) together post-process.  We are doing this in one project.
>
>Sean
>
>-----Original Message-----
>From: Abramowitsch, Peter [mailto:pabramowitsch@hearst.com]
>Sent: Wednesday, July 13, 2016 2:33 PM
>To: dev@ctakes.apache.org
>Subject: Help needed with document creation time/date
>
>Hello All
>
>How can I get Ctakes to deduce the document creation datetime from the 
>text.  I have a pipeline including the following engines Basic Token 
>Processing FastUMLS
>
>Zoner
>
>ClearNLPDependencyParserAE
>
>PolarityCleartkAnalysisEngine
>
>UncertaintyCleartkAnalysisEngine
>
>HistoryCleartkAnalysisEngine
>
>ConditionalCleartkAnalysisEngine
>
>GenericCleartkAnalysisEngine
>
>SubjectCleartkAnalysisEngine
>
>EventAnnotator
>
>AnalysisEngineFactory.createEngineDescription(CopyPropertiesToTemporalE
>ven
>tAnnotator.class)
>
>DocTimeRelAnnotator
>
>BackwardsTimeAnnotator
>
>EventTimeRelationAnnotator
>
>EventEventRelationAnnotator
>
>
>I see that there is a DocumentCreationTime type, but it seems to be 
>initialized from inside one of the ClearTKAnnotators.
>
>I cannot find any documentation and don't know if it is looking for 
>particular manifestations in the text or whether a property needs to be 
>set externally on the JCAS or one of the SOFAs.
>
>
>Any help out there? Examples?
>
>
>Many thanks,
>
>Peter


Re: Help needed with document creation time/date

Posted by "Abramowitsch, Peter" <pa...@hearst.com>.
Thanks Sean.  Great advice.

I have a regexNER, but didn't go that route because it looked as if there
was an inbuilt mechanism waiting to be activated.
Say I know the time from some external source, is there a kosher way I can
inject it into the CAS as a creation time property so that it can be
retrieved later by a client that knows only the serialized CAS?

Peter

On 7/13/16, 11:41 AM, "Finan, Sean" <Se...@childrens.harvard.edu>
wrote:

>Hi Peter,
>
>Our group has used two different approaches, depending upon the note type:
>1.  Use a custom AE that creates creation time based upon a regex.  This
>works well for notes that have a header or footer with a known format.
>2.  Use the last normalized temporal expression.  For my test notes this
>worked more frequently than you would think (~90%), but I would not go
>this route unless you have thoroughly thought about what is in your notes
>and how you are going to use the document creation time.
>
>That is all that we've done with respect to getting the creation time
>from the actual text.  If you have any kind of structured data tied to
>the note that indicates date, then you can tie things (e.g. doctimerel,
>doctime) together post-process.  We are doing this in one project.
>
>Sean
>
>-----Original Message-----
>From: Abramowitsch, Peter [mailto:pabramowitsch@hearst.com]
>Sent: Wednesday, July 13, 2016 2:33 PM
>To: dev@ctakes.apache.org
>Subject: Help needed with document creation time/date
>
>Hello All
>
>How can I get Ctakes to deduce the document creation datetime from the
>text.  I have a pipeline including the following engines Basic Token
>Processing FastUMLS
>
>Zoner
>
>ClearNLPDependencyParserAE
>
>PolarityCleartkAnalysisEngine
>
>UncertaintyCleartkAnalysisEngine
>
>HistoryCleartkAnalysisEngine
>
>ConditionalCleartkAnalysisEngine
>
>GenericCleartkAnalysisEngine
>
>SubjectCleartkAnalysisEngine
>
>EventAnnotator
>
>AnalysisEngineFactory.createEngineDescription(CopyPropertiesToTemporalEven
>tAnnotator.class)
>
>DocTimeRelAnnotator
>
>BackwardsTimeAnnotator
>
>EventTimeRelationAnnotator
>
>EventEventRelationAnnotator
>
>
>I see that there is a DocumentCreationTime type, but it seems to be
>initialized from inside one of the ClearTKAnnotators.
>
>I cannot find any documentation and don't know if it is looking for
>particular manifestations in the text or whether a property needs to be
>set externally on the JCAS or one of the SOFAs.
>
>
>Any help out there? Examples?
>
>
>Many thanks,
>
>Peter


RE: Help needed with document creation time/date

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Peter,

Our group has used two different approaches, depending upon the note type:
1.  Use a custom AE that creates creation time based upon a regex.  This works well for notes that have a header or footer with a known format.
2.  Use the last normalized temporal expression.  For my test notes this worked more frequently than you would think (~90%), but I would not go this route unless you have thoroughly thought about what is in your notes and how you are going to use the document creation time.

That is all that we've done with respect to getting the creation time from the actual text.  If you have any kind of structured data tied to the note that indicates date, then you can tie things (e.g. doctimerel, doctime) together post-process.  We are doing this in one project.

Sean

-----Original Message-----
From: Abramowitsch, Peter [mailto:pabramowitsch@hearst.com] 
Sent: Wednesday, July 13, 2016 2:33 PM
To: dev@ctakes.apache.org
Subject: Help needed with document creation time/date

Hello All

How can I get Ctakes to deduce the document creation datetime from the text.  I have a pipeline including the following engines Basic Token Processing FastUMLS

Zoner

ClearNLPDependencyParserAE

PolarityCleartkAnalysisEngine

UncertaintyCleartkAnalysisEngine

HistoryCleartkAnalysisEngine

ConditionalCleartkAnalysisEngine

GenericCleartkAnalysisEngine

SubjectCleartkAnalysisEngine

EventAnnotator

AnalysisEngineFactory.createEngineDescription(CopyPropertiesToTemporalEventAnnotator.class)

DocTimeRelAnnotator

BackwardsTimeAnnotator

EventTimeRelationAnnotator

EventEventRelationAnnotator


I see that there is a DocumentCreationTime type, but it seems to be initialized from inside one of the ClearTKAnnotators.

I cannot find any documentation and don't know if it is looking for particular manifestations in the text or whether a property needs to be set externally on the JCAS or one of the SOFAs.


Any help out there? Examples?


Many thanks,

Peter