You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@ctakes.apache.org by "Pei Chen (JIRA)" <ji...@apache.org> on 2012/11/27 18:19:58 UTC
[jira] [Created] (CTAKES-105) Add Apache Tika integration
Pei Chen created CTAKES-105:
-------------------------------
Summary: Add Apache Tika integration
Key: CTAKES-105
URL: https://issues.apache.org/jira/browse/CTAKES-105
Project: cTAKES
Issue Type: New Feature
Reporter: Pei Chen
Priority: Minor
Fix For: future enhancement
Would be nice to add in a util/pre-processor to intake any document type (scanned pdf, image, word, pdf, xls, etc.), have something like Apache Tika automatically detect the type, OCR, extract the plain-text, and then feed it to the pipeline.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira