You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by "Hari, Sekhar" <se...@cgi.com> on 2015/04/30 07:11:54 UTC

Image to text conversion

Hello All -

I am looking for an OCR ability in cTAKES. The requirement is to convert scanned image documents (ex: scanned hand written prescriptions) into a text format. Then apply the usual NLP pipeline to convert the unstructured text to a structured data.

Can cTAKES convert scanned image documents into a text? If so, please help me to understand this by sharing any documents or video.

Many thanks,
Sekhar H.


Re: Image to text conversion

Posted by Pei Chen <ch...@apache.org>.
Sekhar,
There are a few open Jira's:
I think it would be a great contribution if you get this to work:

   - CTAKES-189 <https://issues.apache.org/jira/browse/CTAKES-189>

GSoC: Implement OCR/Tika to standardize text input for cTAKES

   -
      - CTAKES-105 <https://issues.apache.org/jira/browse/CTAKES-105>

   Add Apache Tika integration


On Thu, Apr 30, 2015 at 1:21 AM, Hari, Sekhar <se...@cgi.com> wrote:

> Thanks. Let me try this, and will let you know for any help if required.
>
> Cheers,
> Sekhar H.
>
> -----Original Message-----
> From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
> Sent: Thursday, April 30, 2015 10:44 AM
> To: dev@ctakes.apache.org; user@ctakes.apache.org
> Subject: Re: Image to text conversion
>
> What about using Apache Tika within cTAKES for this? Tika supports OCR
> through Tesseract:
>
> http://wiki.apache.org/tika/TikaOCR
>
> Cheers,
> Chris
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398) NASA Jet
> Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department University of
> Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
> -----Original Message-----
> From: <Hari>, Sekhar <se...@cgi.com>
> Reply-To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>
> Date: Wednesday, April 29, 2015 at 10:11 PM
> To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>, "
> user@ctakes.apache.org" <us...@ctakes.apache.org>
> Subject: Image to text conversion
>
> >Hello All -
> >
> >I am looking for an OCR ability in cTAKES. The requirement is to
> >convert scanned image documents (ex: scanned hand written
> >prescriptions) into a text format. Then apply the usual NLP pipeline to
> >convert the unstructured text to a structured data.
> >
> >Can cTAKES convert scanned image documents into a text? If so, please
> >help me to understand this by sharing any documents or video.
> >
> >Many thanks,
> >Sekhar H.
> >
>
>

RE: Image to text conversion

Posted by "Hari, Sekhar" <se...@cgi.com>.
Thanks. Let me try this, and will let you know for any help if required.

Cheers,
Sekhar H.

-----Original Message-----
From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov] 
Sent: Thursday, April 30, 2015 10:44 AM
To: dev@ctakes.apache.org; user@ctakes.apache.org
Subject: Re: Image to text conversion

What about using Apache Tika within cTAKES for this? Tika supports OCR through Tesseract:

http://wiki.apache.org/tika/TikaOCR

Cheers,
Chris


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: <Hari>, Sekhar <se...@cgi.com>
Reply-To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>
Date: Wednesday, April 29, 2015 at 10:11 PM
To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>, "user@ctakes.apache.org" <us...@ctakes.apache.org>
Subject: Image to text conversion

>Hello All -
>
>I am looking for an OCR ability in cTAKES. The requirement is to 
>convert scanned image documents (ex: scanned hand written 
>prescriptions) into a text format. Then apply the usual NLP pipeline to 
>convert the unstructured text to a structured data.
>
>Can cTAKES convert scanned image documents into a text? If so, please 
>help me to understand this by sharing any documents or video.
>
>Many thanks,
>Sekhar H.
>


RE: Image to text conversion

Posted by "Hari, Sekhar" <se...@cgi.com>.
Thanks. Let me try this, and will let you know for any help if required.

Cheers,
Sekhar H.

-----Original Message-----
From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov] 
Sent: Thursday, April 30, 2015 10:44 AM
To: dev@ctakes.apache.org; user@ctakes.apache.org
Subject: Re: Image to text conversion

What about using Apache Tika within cTAKES for this? Tika supports OCR through Tesseract:

http://wiki.apache.org/tika/TikaOCR

Cheers,
Chris


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: <Hari>, Sekhar <se...@cgi.com>
Reply-To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>
Date: Wednesday, April 29, 2015 at 10:11 PM
To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>, "user@ctakes.apache.org" <us...@ctakes.apache.org>
Subject: Image to text conversion

>Hello All -
>
>I am looking for an OCR ability in cTAKES. The requirement is to 
>convert scanned image documents (ex: scanned hand written 
>prescriptions) into a text format. Then apply the usual NLP pipeline to 
>convert the unstructured text to a structured data.
>
>Can cTAKES convert scanned image documents into a text? If so, please 
>help me to understand this by sharing any documents or video.
>
>Many thanks,
>Sekhar H.
>


Re: Image to text conversion

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
What about using Apache Tika within cTAKES for this? Tika supports
OCR through Tesseract:

http://wiki.apache.org/tika/TikaOCR

Cheers,
Chris


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: <Hari>, Sekhar <se...@cgi.com>
Reply-To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>
Date: Wednesday, April 29, 2015 at 10:11 PM
To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>,
"user@ctakes.apache.org" <us...@ctakes.apache.org>
Subject: Image to text conversion

>Hello All -
>
>I am looking for an OCR ability in cTAKES. The requirement is to convert
>scanned image documents (ex: scanned hand written prescriptions) into a
>text format. Then apply the usual NLP pipeline to convert the
>unstructured text to a structured data.
>
>Can cTAKES convert scanned image documents into a text? If so, please
>help me to understand this by sharing any documents or video.
>
>Many thanks,
>Sekhar H.
>


Re: Image to text conversion

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
What about using Apache Tika within cTAKES for this? Tika supports
OCR through Tesseract:

http://wiki.apache.org/tika/TikaOCR

Cheers,
Chris


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: <Hari>, Sekhar <se...@cgi.com>
Reply-To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>
Date: Wednesday, April 29, 2015 at 10:11 PM
To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>,
"user@ctakes.apache.org" <us...@ctakes.apache.org>
Subject: Image to text conversion

>Hello All -
>
>I am looking for an OCR ability in cTAKES. The requirement is to convert
>scanned image documents (ex: scanned hand written prescriptions) into a
>text format. Then apply the usual NLP pipeline to convert the
>unstructured text to a structured data.
>
>Can cTAKES convert scanned image documents into a text? If so, please
>help me to understand this by sharing any documents or video.
>
>Many thanks,
>Sekhar H.
>