You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@ctakes.apache.org by "Pei Chen (JIRA)" <ji...@apache.org> on 2012/11/27 18:19:58 UTC

[jira] [Created] (CTAKES-105) Add Apache Tika integration

Pei Chen created CTAKES-105:
-------------------------------

             Summary: Add Apache Tika integration
                 Key: CTAKES-105
                 URL: https://issues.apache.org/jira/browse/CTAKES-105
             Project: cTAKES
          Issue Type: New Feature
            Reporter: Pei Chen
            Priority: Minor
             Fix For: future enhancement


Would be nice to add in a util/pre-processor to intake any document type (scanned pdf, image, word, pdf, xls, etc.), have something like Apache Tika automatically detect the type, OCR, extract the plain-text, and then feed it to the pipeline.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira