You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov> on 2015/06/07 01:49:28 UTC
Integration of Tika with cTAKES
Hey cTAKES peeps!
We went ahead and integrated Tika with cTAKES for a project I’m
working on at JPL. It will be part of the 1.9 release of Tika. You
can check it out here:
https://wiki.apache.org/tika/cTAKESParser
Feedback welcomed. cTAKES is rad!
Cheers,
Chris
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Re: Integration of Tika with cTAKES
Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
Yep, you need to train tesseract probably. See this link:
https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
Once Tesseract is trained, e.g., on the type of handwritten note
you are dealing with, it will perform better when called through
Tika.
Cheers,
Chris
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-----Original Message-----
From: <Hari>, Sekhar <se...@cgi.com>
Date: Tuesday, June 9, 2015 at 5:26 AM
To: jpluser <ch...@jpl.nasa.gov>
Cc: "user@ctakes.apache.org" <us...@ctakes.apache.org>,
"dev@ctakes.apache.org" <de...@ctakes.apache.org>, "dev@tika.a.o"
<de...@tika.a.o>
Subject: RE: Integration of Tika with cTAKES
>Hello Chris -
>
>I tried the methods mentioned in the link you shared. That has OCR
>feature; but I was unable to configure it to read a handwritten note. The
>software was just not able to recognize anything handwritten; but it was
>able to recognize everything accurately that are machine printed.
>
>Any idea how to train Tika so it can read and convert handwritten
>documents?
>
>Thanks,
>Sekhar H.
>
>-----Original Message-----
>From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
>Sent: Monday, June 08, 2015 11:20 AM
>To: dev@ctakes.apache.org; user@ctakes.apache.org
>Subject: Re: Integration of Tika with cTAKES
>
>Hi Sekhar,
>
>[BCC to dev@tika.a.o to keep them in the loop]
>
>Sure, you can do this with Tika and Tesseract. FYI:
>
>http://wiki.apache.org/tika/TikaOCR/
>
>Enjoy! :)
>
>(pro tip: then check out: http://wiki.apache.org/tika/cTAKESParser
>to see how to run cTAKES on the result with Tika)
>
>Cheers,
>Chris
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Chris Mattmann, Ph.D.
>Chief Architect
>Instrument Software and Science Data Systems Section (398) NASA Jet
>Propulsion Laboratory Pasadena, CA 91109 USA
>Office: 168-519, Mailstop: 168-527
>Email: chris.a.mattmann@nasa.gov
>WWW: http://sunset.usc.edu/~mattmann/
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Adjunct Associate Professor, Computer Science Department University of
>Southern California, Los Angeles, CA 90089 USA
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>-----Original Message-----
>From: <Hari>, Sekhar <se...@cgi.com>
>Reply-To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>
>Date: Sunday, June 7, 2015 at 10:27 PM
>To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>,
>"user@ctakes.apache.org" <us...@ctakes.apache.org>
>Subject: RE: Integration of Tika with cTAKES
>
>>Hello Pei, all -
>>
>>I am looking to convert handwritten image documents (Ex: a physician's
>>handwritten medical prescription) into a text format file. The image
>>documents can be in a PDF, TIFF, GIF etc. formats. Can Tika or
>>Tessaract do this? Can anybody share their experience about this? Also,
>>if it is possible to do with Tika, request you to send me a step-by-step
>>guide.
>>
>>Many thanks,
>>Sekhar H.
>>
>>-----Original Message-----
>>From: Chen, Pei [mailto:Pei.Chen@childrens.harvard.edu]
>>Sent: Sunday, June 07, 2015 10:34 PM
>>To: <de...@ctakes.apache.org>
>>Subject: Re: Integration of Tika with cTAKES
>>
>>This looks awesome.
>>Perhaps we can reuse the Tika server on the ctakes demo VM.
>>
>>Sent from my iPhone
>>
>>> On Jun 6, 2015, at 8:40 PM, jay vyas <ja...@gmail.com>
>>>wrote:
>>>
>>> This is awesome; thanks!
>>>
>>> For some of the new ctakes projects where fplks bc are aiming at
>>> using it with big data tooling, the till abstraction might be super
>>>useful.
>>> On Jun 6, 2015 8:19 PM, "Mattmann, Chris A (3980)" <
>>> chris.a.mattmann@jpl.nasa.gov> wrote:
>>>
>>>> Hey cTAKES peeps!
>>>>
>>>> We went ahead and integrated Tika with cTAKES for a project I'm
>>>> working on at JPL. It will be part of the 1.9 release of Tika. You
>>>> can check it out here:
>>>>
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.apache.org
>>>> _
>>>> tika_cTAKESParser&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCopp
>>>> x
>>>> eFU&r=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WY&m=L070DL_WFb_1U_8j
>>>> G
>>>> dAbnv_Ggx5mnsTfV4Jba6oNNU8&s=vafA1g4UuwgflDIIfKBwceFE2mgCY3VVMJ_A1Pa
>>>> U
>>>> PRM&e=
>>>>
>>>>
>>>> Feedback welcomed. cTAKES is rad!
>>>>
>>>> Cheers,
>>>> Chris
>>>>
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Chris Mattmann, Ph.D.
>>>> Chief Architect
>>>> Instrument Software and Science Data Systems Section (398) NASA Jet
>>>> Propulsion Laboratory Pasadena, CA 91109 USA
>>>> Office: 168-519, Mailstop: 168-527
>>>> Email: chris.a.mattmann@nasa.gov
>>>> WWW:
>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__sunset.usc.edu_-
>>>> 7
>>>> Emattmann_&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=
>>>> h
>>>> uK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WY&m=L070DL_WFb_1U_8jGdAbnv_
>>>> G
>>>> gx5mnsTfV4Jba6oNNU8&s=gFv8mVTL-qCTpFgkWRIC8vlrkwOdiXHUWq2xtCUTI48&e=
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Adjunct Associate Professor, Computer Science Department University
>>>> of Southern California, Los Angeles, CA 90089 USA
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>
>>>>
>>>>
>
Re: Integration of Tika with cTAKES
Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
Yep, you need to train tesseract probably. See this link:
https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
Once Tesseract is trained, e.g., on the type of handwritten note
you are dealing with, it will perform better when called through
Tika.
Cheers,
Chris
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-----Original Message-----
From: <Hari>, Sekhar <se...@cgi.com>
Date: Tuesday, June 9, 2015 at 5:26 AM
To: jpluser <ch...@jpl.nasa.gov>
Cc: "user@ctakes.apache.org" <us...@ctakes.apache.org>,
"dev@ctakes.apache.org" <de...@ctakes.apache.org>, "dev@tika.a.o"
<de...@tika.a.o>
Subject: RE: Integration of Tika with cTAKES
>Hello Chris -
>
>I tried the methods mentioned in the link you shared. That has OCR
>feature; but I was unable to configure it to read a handwritten note. The
>software was just not able to recognize anything handwritten; but it was
>able to recognize everything accurately that are machine printed.
>
>Any idea how to train Tika so it can read and convert handwritten
>documents?
>
>Thanks,
>Sekhar H.
>
>-----Original Message-----
>From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
>Sent: Monday, June 08, 2015 11:20 AM
>To: dev@ctakes.apache.org; user@ctakes.apache.org
>Subject: Re: Integration of Tika with cTAKES
>
>Hi Sekhar,
>
>[BCC to dev@tika.a.o to keep them in the loop]
>
>Sure, you can do this with Tika and Tesseract. FYI:
>
>http://wiki.apache.org/tika/TikaOCR/
>
>Enjoy! :)
>
>(pro tip: then check out: http://wiki.apache.org/tika/cTAKESParser
>to see how to run cTAKES on the result with Tika)
>
>Cheers,
>Chris
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Chris Mattmann, Ph.D.
>Chief Architect
>Instrument Software and Science Data Systems Section (398) NASA Jet
>Propulsion Laboratory Pasadena, CA 91109 USA
>Office: 168-519, Mailstop: 168-527
>Email: chris.a.mattmann@nasa.gov
>WWW: http://sunset.usc.edu/~mattmann/
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Adjunct Associate Professor, Computer Science Department University of
>Southern California, Los Angeles, CA 90089 USA
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>-----Original Message-----
>From: <Hari>, Sekhar <se...@cgi.com>
>Reply-To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>
>Date: Sunday, June 7, 2015 at 10:27 PM
>To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>,
>"user@ctakes.apache.org" <us...@ctakes.apache.org>
>Subject: RE: Integration of Tika with cTAKES
>
>>Hello Pei, all -
>>
>>I am looking to convert handwritten image documents (Ex: a physician's
>>handwritten medical prescription) into a text format file. The image
>>documents can be in a PDF, TIFF, GIF etc. formats. Can Tika or
>>Tessaract do this? Can anybody share their experience about this? Also,
>>if it is possible to do with Tika, request you to send me a step-by-step
>>guide.
>>
>>Many thanks,
>>Sekhar H.
>>
>>-----Original Message-----
>>From: Chen, Pei [mailto:Pei.Chen@childrens.harvard.edu]
>>Sent: Sunday, June 07, 2015 10:34 PM
>>To: <de...@ctakes.apache.org>
>>Subject: Re: Integration of Tika with cTAKES
>>
>>This looks awesome.
>>Perhaps we can reuse the Tika server on the ctakes demo VM.
>>
>>Sent from my iPhone
>>
>>> On Jun 6, 2015, at 8:40 PM, jay vyas <ja...@gmail.com>
>>>wrote:
>>>
>>> This is awesome; thanks!
>>>
>>> For some of the new ctakes projects where fplks bc are aiming at
>>> using it with big data tooling, the till abstraction might be super
>>>useful.
>>> On Jun 6, 2015 8:19 PM, "Mattmann, Chris A (3980)" <
>>> chris.a.mattmann@jpl.nasa.gov> wrote:
>>>
>>>> Hey cTAKES peeps!
>>>>
>>>> We went ahead and integrated Tika with cTAKES for a project I'm
>>>> working on at JPL. It will be part of the 1.9 release of Tika. You
>>>> can check it out here:
>>>>
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.apache.org
>>>> _
>>>> tika_cTAKESParser&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCopp
>>>> x
>>>> eFU&r=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WY&m=L070DL_WFb_1U_8j
>>>> G
>>>> dAbnv_Ggx5mnsTfV4Jba6oNNU8&s=vafA1g4UuwgflDIIfKBwceFE2mgCY3VVMJ_A1Pa
>>>> U
>>>> PRM&e=
>>>>
>>>>
>>>> Feedback welcomed. cTAKES is rad!
>>>>
>>>> Cheers,
>>>> Chris
>>>>
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Chris Mattmann, Ph.D.
>>>> Chief Architect
>>>> Instrument Software and Science Data Systems Section (398) NASA Jet
>>>> Propulsion Laboratory Pasadena, CA 91109 USA
>>>> Office: 168-519, Mailstop: 168-527
>>>> Email: chris.a.mattmann@nasa.gov
>>>> WWW:
>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__sunset.usc.edu_-
>>>> 7
>>>> Emattmann_&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=
>>>> h
>>>> uK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WY&m=L070DL_WFb_1U_8jGdAbnv_
>>>> G
>>>> gx5mnsTfV4Jba6oNNU8&s=gFv8mVTL-qCTpFgkWRIC8vlrkwOdiXHUWq2xtCUTI48&e=
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Adjunct Associate Professor, Computer Science Department University
>>>> of Southern California, Los Angeles, CA 90089 USA
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>
>>>>
>>>>
>
RE: Integration of Tika with cTAKES
Posted by "Hari, Sekhar" <se...@cgi.com>.
Hello Chris -
I tried the methods mentioned in the link you shared. That has OCR feature; but I was unable to configure it to read a handwritten note. The software was just not able to recognize anything handwritten; but it was able to recognize everything accurately that are machine printed.
Any idea how to train Tika so it can read and convert handwritten documents?
Thanks,
Sekhar H.
-----Original Message-----
From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
Sent: Monday, June 08, 2015 11:20 AM
To: dev@ctakes.apache.org; user@ctakes.apache.org
Subject: Re: Integration of Tika with cTAKES
Hi Sekhar,
[BCC to dev@tika.a.o to keep them in the loop]
Sure, you can do this with Tika and Tesseract. FYI:
http://wiki.apache.org/tika/TikaOCR/
Enjoy! :)
(pro tip: then check out: http://wiki.apache.org/tika/cTAKESParser
to see how to run cTAKES on the result with Tika)
Cheers,
Chris
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-----Original Message-----
From: <Hari>, Sekhar <se...@cgi.com>
Reply-To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>
Date: Sunday, June 7, 2015 at 10:27 PM
To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>, "user@ctakes.apache.org" <us...@ctakes.apache.org>
Subject: RE: Integration of Tika with cTAKES
>Hello Pei, all -
>
>I am looking to convert handwritten image documents (Ex: a physician's
>handwritten medical prescription) into a text format file. The image
>documents can be in a PDF, TIFF, GIF etc. formats. Can Tika or
>Tessaract do this? Can anybody share their experience about this? Also,
>if it is possible to do with Tika, request you to send me a step-by-step guide.
>
>Many thanks,
>Sekhar H.
>
>-----Original Message-----
>From: Chen, Pei [mailto:Pei.Chen@childrens.harvard.edu]
>Sent: Sunday, June 07, 2015 10:34 PM
>To: <de...@ctakes.apache.org>
>Subject: Re: Integration of Tika with cTAKES
>
>This looks awesome.
>Perhaps we can reuse the Tika server on the ctakes demo VM.
>
>Sent from my iPhone
>
>> On Jun 6, 2015, at 8:40 PM, jay vyas <ja...@gmail.com>
>>wrote:
>>
>> This is awesome; thanks!
>>
>> For some of the new ctakes projects where fplks bc are aiming at
>> using it with big data tooling, the till abstraction might be super useful.
>> On Jun 6, 2015 8:19 PM, "Mattmann, Chris A (3980)" <
>> chris.a.mattmann@jpl.nasa.gov> wrote:
>>
>>> Hey cTAKES peeps!
>>>
>>> We went ahead and integrated Tika with cTAKES for a project I'm
>>> working on at JPL. It will be part of the 1.9 release of Tika. You
>>> can check it out here:
>>>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.apache.org
>>> _
>>> tika_cTAKESParser&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCopp
>>> x
>>> eFU&r=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WY&m=L070DL_WFb_1U_8j
>>> G
>>> dAbnv_Ggx5mnsTfV4Jba6oNNU8&s=vafA1g4UuwgflDIIfKBwceFE2mgCY3VVMJ_A1Pa
>>> U
>>> PRM&e=
>>>
>>>
>>> Feedback welcomed. cTAKES is rad!
>>>
>>> Cheers,
>>> Chris
>>>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Chris Mattmann, Ph.D.
>>> Chief Architect
>>> Instrument Software and Science Data Systems Section (398) NASA Jet
>>> Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 168-519, Mailstop: 168-527
>>> Email: chris.a.mattmann@nasa.gov
>>> WWW:
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__sunset.usc.edu_-
>>> 7
>>> Emattmann_&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=
>>> h
>>> uK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WY&m=L070DL_WFb_1U_8jGdAbnv_
>>> G
>>> gx5mnsTfV4Jba6oNNU8&s=gFv8mVTL-qCTpFgkWRIC8vlrkwOdiXHUWq2xtCUTI48&e=
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Adjunct Associate Professor, Computer Science Department University
>>> of Southern California, Los Angeles, CA 90089 USA
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>>
RE: Integration of Tika with cTAKES
Posted by "Hari, Sekhar" <se...@cgi.com>.
Hello Chris -
I tried the methods mentioned in the link you shared. That has OCR feature; but I was unable to configure it to read a handwritten note. The software was just not able to recognize anything handwritten; but it was able to recognize everything accurately that are machine printed.
Any idea how to train Tika so it can read and convert handwritten documents?
Thanks,
Sekhar H.
-----Original Message-----
From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
Sent: Monday, June 08, 2015 11:20 AM
To: dev@ctakes.apache.org; user@ctakes.apache.org
Subject: Re: Integration of Tika with cTAKES
Hi Sekhar,
[BCC to dev@tika.a.o to keep them in the loop]
Sure, you can do this with Tika and Tesseract. FYI:
http://wiki.apache.org/tika/TikaOCR/
Enjoy! :)
(pro tip: then check out: http://wiki.apache.org/tika/cTAKESParser
to see how to run cTAKES on the result with Tika)
Cheers,
Chris
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-----Original Message-----
From: <Hari>, Sekhar <se...@cgi.com>
Reply-To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>
Date: Sunday, June 7, 2015 at 10:27 PM
To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>, "user@ctakes.apache.org" <us...@ctakes.apache.org>
Subject: RE: Integration of Tika with cTAKES
>Hello Pei, all -
>
>I am looking to convert handwritten image documents (Ex: a physician's
>handwritten medical prescription) into a text format file. The image
>documents can be in a PDF, TIFF, GIF etc. formats. Can Tika or
>Tessaract do this? Can anybody share their experience about this? Also,
>if it is possible to do with Tika, request you to send me a step-by-step guide.
>
>Many thanks,
>Sekhar H.
>
>-----Original Message-----
>From: Chen, Pei [mailto:Pei.Chen@childrens.harvard.edu]
>Sent: Sunday, June 07, 2015 10:34 PM
>To: <de...@ctakes.apache.org>
>Subject: Re: Integration of Tika with cTAKES
>
>This looks awesome.
>Perhaps we can reuse the Tika server on the ctakes demo VM.
>
>Sent from my iPhone
>
>> On Jun 6, 2015, at 8:40 PM, jay vyas <ja...@gmail.com>
>>wrote:
>>
>> This is awesome; thanks!
>>
>> For some of the new ctakes projects where fplks bc are aiming at
>> using it with big data tooling, the till abstraction might be super useful.
>> On Jun 6, 2015 8:19 PM, "Mattmann, Chris A (3980)" <
>> chris.a.mattmann@jpl.nasa.gov> wrote:
>>
>>> Hey cTAKES peeps!
>>>
>>> We went ahead and integrated Tika with cTAKES for a project I'm
>>> working on at JPL. It will be part of the 1.9 release of Tika. You
>>> can check it out here:
>>>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.apache.org
>>> _
>>> tika_cTAKESParser&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCopp
>>> x
>>> eFU&r=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WY&m=L070DL_WFb_1U_8j
>>> G
>>> dAbnv_Ggx5mnsTfV4Jba6oNNU8&s=vafA1g4UuwgflDIIfKBwceFE2mgCY3VVMJ_A1Pa
>>> U
>>> PRM&e=
>>>
>>>
>>> Feedback welcomed. cTAKES is rad!
>>>
>>> Cheers,
>>> Chris
>>>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Chris Mattmann, Ph.D.
>>> Chief Architect
>>> Instrument Software and Science Data Systems Section (398) NASA Jet
>>> Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 168-519, Mailstop: 168-527
>>> Email: chris.a.mattmann@nasa.gov
>>> WWW:
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__sunset.usc.edu_-
>>> 7
>>> Emattmann_&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=
>>> h
>>> uK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WY&m=L070DL_WFb_1U_8jGdAbnv_
>>> G
>>> gx5mnsTfV4Jba6oNNU8&s=gFv8mVTL-qCTpFgkWRIC8vlrkwOdiXHUWq2xtCUTI48&e=
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Adjunct Associate Professor, Computer Science Department University
>>> of Southern California, Los Angeles, CA 90089 USA
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>>
Re: Integration of Tika with cTAKES
Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
Hi Sekhar,
[BCC to dev@tika.a.o to keep them in the loop]
Sure, you can do this with Tika and Tesseract. FYI:
http://wiki.apache.org/tika/TikaOCR/
Enjoy! :)
(pro tip: then check out: http://wiki.apache.org/tika/cTAKESParser
to see how to run cTAKES on the result with Tika)
Cheers,
Chris
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-----Original Message-----
From: <Hari>, Sekhar <se...@cgi.com>
Reply-To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>
Date: Sunday, June 7, 2015 at 10:27 PM
To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>,
"user@ctakes.apache.org" <us...@ctakes.apache.org>
Subject: RE: Integration of Tika with cTAKES
>Hello Pei, all -
>
>I am looking to convert handwritten image documents (Ex: a physician's
>handwritten medical prescription) into a text format file. The image
>documents can be in a PDF, TIFF, GIF etc. formats. Can Tika or Tessaract
>do this? Can anybody share their experience about this? Also, if it is
>possible to do with Tika, request you to send me a step-by-step guide.
>
>Many thanks,
>Sekhar H.
>
>-----Original Message-----
>From: Chen, Pei [mailto:Pei.Chen@childrens.harvard.edu]
>Sent: Sunday, June 07, 2015 10:34 PM
>To: <de...@ctakes.apache.org>
>Subject: Re: Integration of Tika with cTAKES
>
>This looks awesome.
>Perhaps we can reuse the Tika server on the ctakes demo VM.
>
>Sent from my iPhone
>
>> On Jun 6, 2015, at 8:40 PM, jay vyas <ja...@gmail.com>
>>wrote:
>>
>> This is awesome; thanks!
>>
>> For some of the new ctakes projects where fplks bc are aiming at using
>> it with big data tooling, the till abstraction might be super useful.
>> On Jun 6, 2015 8:19 PM, "Mattmann, Chris A (3980)" <
>> chris.a.mattmann@jpl.nasa.gov> wrote:
>>
>>> Hey cTAKES peeps!
>>>
>>> We went ahead and integrated Tika with cTAKES for a project I'm
>>> working on at JPL. It will be part of the 1.9 release of Tika. You
>>> can check it out here:
>>>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.apache.org_
>>> tika_cTAKESParser&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppx
>>> eFU&r=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WY&m=L070DL_WFb_1U_8jG
>>> dAbnv_Ggx5mnsTfV4Jba6oNNU8&s=vafA1g4UuwgflDIIfKBwceFE2mgCY3VVMJ_A1PaU
>>> PRM&e=
>>>
>>>
>>> Feedback welcomed. cTAKES is rad!
>>>
>>> Cheers,
>>> Chris
>>>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Chris Mattmann, Ph.D.
>>> Chief Architect
>>> Instrument Software and Science Data Systems Section (398) NASA Jet
>>> Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 168-519, Mailstop: 168-527
>>> Email: chris.a.mattmann@nasa.gov
>>> WWW:
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__sunset.usc.edu_-7
>>> Emattmann_&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=h
>>> uK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WY&m=L070DL_WFb_1U_8jGdAbnv_G
>>> gx5mnsTfV4Jba6oNNU8&s=gFv8mVTL-qCTpFgkWRIC8vlrkwOdiXHUWq2xtCUTI48&e=
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Adjunct Associate Professor, Computer Science Department University
>>> of Southern California, Los Angeles, CA 90089 USA
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>>
Re: Integration of Tika with cTAKES
Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
Hi Sekhar,
[BCC to dev@tika.a.o to keep them in the loop]
Sure, you can do this with Tika and Tesseract. FYI:
http://wiki.apache.org/tika/TikaOCR/
Enjoy! :)
(pro tip: then check out: http://wiki.apache.org/tika/cTAKESParser
to see how to run cTAKES on the result with Tika)
Cheers,
Chris
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-----Original Message-----
From: <Hari>, Sekhar <se...@cgi.com>
Reply-To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>
Date: Sunday, June 7, 2015 at 10:27 PM
To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>,
"user@ctakes.apache.org" <us...@ctakes.apache.org>
Subject: RE: Integration of Tika with cTAKES
>Hello Pei, all -
>
>I am looking to convert handwritten image documents (Ex: a physician's
>handwritten medical prescription) into a text format file. The image
>documents can be in a PDF, TIFF, GIF etc. formats. Can Tika or Tessaract
>do this? Can anybody share their experience about this? Also, if it is
>possible to do with Tika, request you to send me a step-by-step guide.
>
>Many thanks,
>Sekhar H.
>
>-----Original Message-----
>From: Chen, Pei [mailto:Pei.Chen@childrens.harvard.edu]
>Sent: Sunday, June 07, 2015 10:34 PM
>To: <de...@ctakes.apache.org>
>Subject: Re: Integration of Tika with cTAKES
>
>This looks awesome.
>Perhaps we can reuse the Tika server on the ctakes demo VM.
>
>Sent from my iPhone
>
>> On Jun 6, 2015, at 8:40 PM, jay vyas <ja...@gmail.com>
>>wrote:
>>
>> This is awesome; thanks!
>>
>> For some of the new ctakes projects where fplks bc are aiming at using
>> it with big data tooling, the till abstraction might be super useful.
>> On Jun 6, 2015 8:19 PM, "Mattmann, Chris A (3980)" <
>> chris.a.mattmann@jpl.nasa.gov> wrote:
>>
>>> Hey cTAKES peeps!
>>>
>>> We went ahead and integrated Tika with cTAKES for a project I'm
>>> working on at JPL. It will be part of the 1.9 release of Tika. You
>>> can check it out here:
>>>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.apache.org_
>>> tika_cTAKESParser&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppx
>>> eFU&r=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WY&m=L070DL_WFb_1U_8jG
>>> dAbnv_Ggx5mnsTfV4Jba6oNNU8&s=vafA1g4UuwgflDIIfKBwceFE2mgCY3VVMJ_A1PaU
>>> PRM&e=
>>>
>>>
>>> Feedback welcomed. cTAKES is rad!
>>>
>>> Cheers,
>>> Chris
>>>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Chris Mattmann, Ph.D.
>>> Chief Architect
>>> Instrument Software and Science Data Systems Section (398) NASA Jet
>>> Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 168-519, Mailstop: 168-527
>>> Email: chris.a.mattmann@nasa.gov
>>> WWW:
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__sunset.usc.edu_-7
>>> Emattmann_&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=h
>>> uK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WY&m=L070DL_WFb_1U_8jGdAbnv_G
>>> gx5mnsTfV4Jba6oNNU8&s=gFv8mVTL-qCTpFgkWRIC8vlrkwOdiXHUWq2xtCUTI48&e=
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Adjunct Associate Professor, Computer Science Department University
>>> of Southern California, Los Angeles, CA 90089 USA
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>>
Re: Integration of Tika with cTAKES
Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
Hi Sekhar,
[BCC to dev@tika.a.o to keep them in the loop]
Sure, you can do this with Tika and Tesseract. FYI:
http://wiki.apache.org/tika/TikaOCR/
Enjoy! :)
(pro tip: then check out: http://wiki.apache.org/tika/cTAKESParser
to see how to run cTAKES on the result with Tika)
Cheers,
Chris
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-----Original Message-----
From: <Hari>, Sekhar <se...@cgi.com>
Reply-To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>
Date: Sunday, June 7, 2015 at 10:27 PM
To: "dev@ctakes.apache.org" <de...@ctakes.apache.org>,
"user@ctakes.apache.org" <us...@ctakes.apache.org>
Subject: RE: Integration of Tika with cTAKES
>Hello Pei, all -
>
>I am looking to convert handwritten image documents (Ex: a physician's
>handwritten medical prescription) into a text format file. The image
>documents can be in a PDF, TIFF, GIF etc. formats. Can Tika or Tessaract
>do this? Can anybody share their experience about this? Also, if it is
>possible to do with Tika, request you to send me a step-by-step guide.
>
>Many thanks,
>Sekhar H.
>
>-----Original Message-----
>From: Chen, Pei [mailto:Pei.Chen@childrens.harvard.edu]
>Sent: Sunday, June 07, 2015 10:34 PM
>To: <de...@ctakes.apache.org>
>Subject: Re: Integration of Tika with cTAKES
>
>This looks awesome.
>Perhaps we can reuse the Tika server on the ctakes demo VM.
>
>Sent from my iPhone
>
>> On Jun 6, 2015, at 8:40 PM, jay vyas <ja...@gmail.com>
>>wrote:
>>
>> This is awesome; thanks!
>>
>> For some of the new ctakes projects where fplks bc are aiming at using
>> it with big data tooling, the till abstraction might be super useful.
>> On Jun 6, 2015 8:19 PM, "Mattmann, Chris A (3980)" <
>> chris.a.mattmann@jpl.nasa.gov> wrote:
>>
>>> Hey cTAKES peeps!
>>>
>>> We went ahead and integrated Tika with cTAKES for a project I'm
>>> working on at JPL. It will be part of the 1.9 release of Tika. You
>>> can check it out here:
>>>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.apache.org_
>>> tika_cTAKESParser&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppx
>>> eFU&r=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WY&m=L070DL_WFb_1U_8jG
>>> dAbnv_Ggx5mnsTfV4Jba6oNNU8&s=vafA1g4UuwgflDIIfKBwceFE2mgCY3VVMJ_A1PaU
>>> PRM&e=
>>>
>>>
>>> Feedback welcomed. cTAKES is rad!
>>>
>>> Cheers,
>>> Chris
>>>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Chris Mattmann, Ph.D.
>>> Chief Architect
>>> Instrument Software and Science Data Systems Section (398) NASA Jet
>>> Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 168-519, Mailstop: 168-527
>>> Email: chris.a.mattmann@nasa.gov
>>> WWW:
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__sunset.usc.edu_-7
>>> Emattmann_&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=h
>>> uK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WY&m=L070DL_WFb_1U_8jGdAbnv_G
>>> gx5mnsTfV4Jba6oNNU8&s=gFv8mVTL-qCTpFgkWRIC8vlrkwOdiXHUWq2xtCUTI48&e=
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Adjunct Associate Professor, Computer Science Department University
>>> of Southern California, Los Angeles, CA 90089 USA
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>>
RE: Integration of Tika with cTAKES
Posted by "Hari, Sekhar" <se...@cgi.com>.
Hello Pei, all -
I am looking to convert handwritten image documents (Ex: a physician's handwritten medical prescription) into a text format file. The image documents can be in a PDF, TIFF, GIF etc. formats. Can Tika or Tessaract do this? Can anybody share their experience about this? Also, if it is possible to do with Tika, request you to send me a step-by-step guide.
Many thanks,
Sekhar H.
-----Original Message-----
From: Chen, Pei [mailto:Pei.Chen@childrens.harvard.edu]
Sent: Sunday, June 07, 2015 10:34 PM
To: <de...@ctakes.apache.org>
Subject: Re: Integration of Tika with cTAKES
This looks awesome.
Perhaps we can reuse the Tika server on the ctakes demo VM.
Sent from my iPhone
> On Jun 6, 2015, at 8:40 PM, jay vyas <ja...@gmail.com> wrote:
>
> This is awesome; thanks!
>
> For some of the new ctakes projects where fplks bc are aiming at using
> it with big data tooling, the till abstraction might be super useful.
> On Jun 6, 2015 8:19 PM, "Mattmann, Chris A (3980)" <
> chris.a.mattmann@jpl.nasa.gov> wrote:
>
>> Hey cTAKES peeps!
>>
>> We went ahead and integrated Tika with cTAKES for a project I'm
>> working on at JPL. It will be part of the 1.9 release of Tika. You
>> can check it out here:
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.apache.org_
>> tika_cTAKESParser&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppx
>> eFU&r=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WY&m=L070DL_WFb_1U_8jG
>> dAbnv_Ggx5mnsTfV4Jba6oNNU8&s=vafA1g4UuwgflDIIfKBwceFE2mgCY3VVMJ_A1PaU
>> PRM&e=
>>
>>
>> Feedback welcomed. cTAKES is rad!
>>
>> Cheers,
>> Chris
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398) NASA Jet
>> Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: chris.a.mattmann@nasa.gov
>> WWW:
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__sunset.usc.edu_-7
>> Emattmann_&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=h
>> uK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WY&m=L070DL_WFb_1U_8jGdAbnv_G
>> gx5mnsTfV4Jba6oNNU8&s=gFv8mVTL-qCTpFgkWRIC8vlrkwOdiXHUWq2xtCUTI48&e=
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Associate Professor, Computer Science Department University
>> of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
RE: Integration of Tika with cTAKES
Posted by "Hari, Sekhar" <se...@cgi.com>.
Hello Pei, all -
I am looking to convert handwritten image documents (Ex: a physician's handwritten medical prescription) into a text format file. The image documents can be in a PDF, TIFF, GIF etc. formats. Can Tika or Tessaract do this? Can anybody share their experience about this? Also, if it is possible to do with Tika, request you to send me a step-by-step guide.
Many thanks,
Sekhar H.
-----Original Message-----
From: Chen, Pei [mailto:Pei.Chen@childrens.harvard.edu]
Sent: Sunday, June 07, 2015 10:34 PM
To: <de...@ctakes.apache.org>
Subject: Re: Integration of Tika with cTAKES
This looks awesome.
Perhaps we can reuse the Tika server on the ctakes demo VM.
Sent from my iPhone
> On Jun 6, 2015, at 8:40 PM, jay vyas <ja...@gmail.com> wrote:
>
> This is awesome; thanks!
>
> For some of the new ctakes projects where fplks bc are aiming at using
> it with big data tooling, the till abstraction might be super useful.
> On Jun 6, 2015 8:19 PM, "Mattmann, Chris A (3980)" <
> chris.a.mattmann@jpl.nasa.gov> wrote:
>
>> Hey cTAKES peeps!
>>
>> We went ahead and integrated Tika with cTAKES for a project I'm
>> working on at JPL. It will be part of the 1.9 release of Tika. You
>> can check it out here:
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.apache.org_
>> tika_cTAKESParser&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppx
>> eFU&r=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WY&m=L070DL_WFb_1U_8jG
>> dAbnv_Ggx5mnsTfV4Jba6oNNU8&s=vafA1g4UuwgflDIIfKBwceFE2mgCY3VVMJ_A1PaU
>> PRM&e=
>>
>>
>> Feedback welcomed. cTAKES is rad!
>>
>> Cheers,
>> Chris
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398) NASA Jet
>> Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: chris.a.mattmann@nasa.gov
>> WWW:
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__sunset.usc.edu_-7
>> Emattmann_&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=h
>> uK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WY&m=L070DL_WFb_1U_8jGdAbnv_G
>> gx5mnsTfV4Jba6oNNU8&s=gFv8mVTL-qCTpFgkWRIC8vlrkwOdiXHUWq2xtCUTI48&e=
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Associate Professor, Computer Science Department University
>> of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
Re: Integration of Tika with cTAKES
Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.
This looks awesome.
Perhaps we can reuse the Tika server on the ctakes demo VM.
Sent from my iPhone
> On Jun 6, 2015, at 8:40 PM, jay vyas <ja...@gmail.com> wrote:
>
> This is awesome; thanks!
>
> For some of the new ctakes projects where fplks bc are aiming at using it
> with big data tooling, the till abstraction might be super useful.
> On Jun 6, 2015 8:19 PM, "Mattmann, Chris A (3980)" <
> chris.a.mattmann@jpl.nasa.gov> wrote:
>
>> Hey cTAKES peeps!
>>
>> We went ahead and integrated Tika with cTAKES for a project I’m
>> working on at JPL. It will be part of the 1.9 release of Tika. You
>> can check it out here:
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.apache.org_tika_cTAKESParser&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WY&m=L070DL_WFb_1U_8jGdAbnv_Ggx5mnsTfV4Jba6oNNU8&s=vafA1g4UuwgflDIIfKBwceFE2mgCY3VVMJ_A1PaUPRM&e=
>>
>>
>> Feedback welcomed. cTAKES is rad!
>>
>> Cheers,
>> Chris
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398)
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: chris.a.mattmann@nasa.gov
>> WWW: https://urldefense.proofpoint.com/v2/url?u=http-3A__sunset.usc.edu_-7Emattmann_&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=huK2MFkj300qccT8OSuuoYhy_xEYujfPwiAxhPVz5WY&m=L070DL_WFb_1U_8jGdAbnv_Ggx5mnsTfV4Jba6oNNU8&s=gFv8mVTL-qCTpFgkWRIC8vlrkwOdiXHUWq2xtCUTI48&e=
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Associate Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
Re: Integration of Tika with cTAKES
Posted by jay vyas <ja...@gmail.com>.
This is awesome; thanks!
For some of the new ctakes projects where fplks bc are aiming at using it
with big data tooling, the till abstraction might be super useful.
On Jun 6, 2015 8:19 PM, "Mattmann, Chris A (3980)" <
chris.a.mattmann@jpl.nasa.gov> wrote:
> Hey cTAKES peeps!
>
> We went ahead and integrated Tika with cTAKES for a project I’m
> working on at JPL. It will be part of the 1.9 release of Tika. You
> can check it out here:
>
> https://wiki.apache.org/tika/cTAKESParser
>
>
> Feedback welcomed. cTAKES is rad!
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattmann@nasa.gov
> WWW: http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>