You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ctakes.apache.org by David Kincaid <ki...@gmail.com> on 2013/09/01 23:41:51 UTC

RTF Annotator?

Before I embark on building an RTF annotator I thought I'd ask around a bit
to see if anyone had built such a thing. Most of the medical notes I have
to handle are in RTF format. I can pretty easily extract the text only
using something like Apache TIka, but there is important information in the
formatting as well (bold, italic, font sizes, centering, tables, etc) that
I'd like to use. Is anyone aware of a UIMA annotator that does this already?

Thanks,

Dave Kincaid

Re: RTF Annotator?

Posted by Karthik Sarma <ks...@ksarma.com>.
I think such a tool would be quite useful -- I imagine that David isn't the
only person who works with RTF docs, and avoiding conversion should help us
glean additional information as James suggests.

Let me know if you need my assistance with anything!





--
Karthik Sarma
UCLA Medical Scientist Training Program Class of 20??
Member, UCLA Medical Imaging & Informatics Lab
Member, CA Delegation to the House of Delegates of the American Medical
Association
ksarma@ksarma.com
gchat: ksarma@gmail.com
linkedin: www.linkedin.com/in/ksarma


On Tue, Sep 3, 2013 at 11:36 AM, Masanz, James J. <Ma...@mayo.edu>wrote:

> I think text formatting is a natural for being turned into annotations.
> Just one example - some people use formatting to indicate section headings
> and there could be a sectionizer that uses rtf tags as-is to determine
> sections, or uses them as features at least.
>
> -- James
>
> > -----Original Message-----
> > From: dev-return-1935-Masanz.James=mayo.edu@ctakes.apache.org [mailto:
> dev-
> > return-1935-Masanz.James=mayo.edu@ctakes.apache.org] On Behalf Of Pei
> Chen
> > Sent: Tuesday, September 03, 2013 9:10 AM
> > To: user@ctakes.apache.org; dev@ctakes.apache.org
> > Subject: Re: RTF Annotator?
> >
> > Hi David,
> > There is work being done on Tika/OCR integration, but I am not aware of
> > any cTAKES RTF Annotators.
> > What does others think? Having additional meta data such does sound very
> > interesting especially with mark-ups (bold/italics) and semi-structured
> > data such as tables...
> >
> > --Pei
> >
> >
> > On Sun, Sep 1, 2013 at 5:41 PM, David Kincaid
> > <ki...@gmail.com>wrote:
> >
> > > Before I embark on building an RTF annotator I thought I'd ask around
> > > a bit to see if anyone had built such a thing. Most of the medical
> > > notes I have to handle are in RTF format. I can pretty easily extract
> > > the text only using something like Apache TIka, but there is important
> > > information in the formatting as well (bold, italic, font sizes,
> > > centering, tables, etc) that I'd like to use. Is anyone aware of a UIMA
> > annotator that does this already?
> > >
> > > Thanks,
> > >
> > > Dave Kincaid
> > >
>

RE: RTF Annotator?

Posted by "Masanz, James J." <Ma...@mayo.edu>.
I think text formatting is a natural for being turned into annotations. Just one example - some people use formatting to indicate section headings and there could be a sectionizer that uses rtf tags as-is to determine sections, or uses them as features at least.

-- James

> -----Original Message-----
> From: dev-return-1935-Masanz.James=mayo.edu@ctakes.apache.org [mailto:dev-
> return-1935-Masanz.James=mayo.edu@ctakes.apache.org] On Behalf Of Pei Chen
> Sent: Tuesday, September 03, 2013 9:10 AM
> To: user@ctakes.apache.org; dev@ctakes.apache.org
> Subject: Re: RTF Annotator?
> 
> Hi David,
> There is work being done on Tika/OCR integration, but I am not aware of
> any cTAKES RTF Annotators.
> What does others think? Having additional meta data such does sound very
> interesting especially with mark-ups (bold/italics) and semi-structured
> data such as tables...
> 
> --Pei
> 
> 
> On Sun, Sep 1, 2013 at 5:41 PM, David Kincaid
> <ki...@gmail.com>wrote:
> 
> > Before I embark on building an RTF annotator I thought I'd ask around
> > a bit to see if anyone had built such a thing. Most of the medical
> > notes I have to handle are in RTF format. I can pretty easily extract
> > the text only using something like Apache TIka, but there is important
> > information in the formatting as well (bold, italic, font sizes,
> > centering, tables, etc) that I'd like to use. Is anyone aware of a UIMA
> annotator that does this already?
> >
> > Thanks,
> >
> > Dave Kincaid
> >

Re: RTF Annotator?

Posted by Pei Chen <ch...@apache.org>.
Hi David,
There is work being done on Tika/OCR integration, but I am not aware of any
cTAKES RTF Annotators.
What does others think? Having additional meta data such does sound very
interesting especially with mark-ups (bold/italics) and semi-structured
data such as tables...

--Pei


On Sun, Sep 1, 2013 at 5:41 PM, David Kincaid <ki...@gmail.com>wrote:

> Before I embark on building an RTF annotator I thought I'd ask around a
> bit to see if anyone had built such a thing. Most of the medical notes I
> have to handle are in RTF format. I can pretty easily extract the text only
> using something like Apache TIka, but there is important information in the
> formatting as well (bold, italic, font sizes, centering, tables, etc) that
> I'd like to use. Is anyone aware of a UIMA annotator that does this already?
>
> Thanks,
>
> Dave Kincaid
>

Re: RTF Annotator?

Posted by Pei Chen <ch...@apache.org>.
Hi David,
There is work being done on Tika/OCR integration, but I am not aware of any
cTAKES RTF Annotators.
What does others think? Having additional meta data such does sound very
interesting especially with mark-ups (bold/italics) and semi-structured
data such as tables...

--Pei


On Sun, Sep 1, 2013 at 5:41 PM, David Kincaid <ki...@gmail.com>wrote:

> Before I embark on building an RTF annotator I thought I'd ask around a
> bit to see if anyone had built such a thing. Most of the medical notes I
> have to handle are in RTF format. I can pretty easily extract the text only
> using something like Apache TIka, but there is important information in the
> formatting as well (bold, italic, font sizes, centering, tables, etc) that
> I'd like to use. Is anyone aware of a UIMA annotator that does this already?
>
> Thanks,
>
> Dave Kincaid
>