You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Matthew Caruana Galizia (JIRA)" <ji...@apache.org> on 2017/10/06 10:32:00 UTC
[jira] [Created] (TIKA-2473) PCX and DCX image support
Matthew Caruana Galizia created TIKA-2473:
---------------------------------------------
Summary: PCX and DCX image support
Key: TIKA-2473
URL: https://issues.apache.org/jira/browse/TIKA-2473
Project: Tika
Issue Type: Improvement
Components: parser
Affects Versions: 1.16
Reporter: Matthew Caruana Galizia
It's straightforward in theory to implement support for PCX and DCX. There's support for it in Commons Imaging as well as in ImageIO via TwelveMonkeys.
In practise, however, I'm not really sure how implement support. We obviously want to OCR the images, but Tesseract has no support for the format. So where do we do the conversion to a BufferedImage? I tried to look for what is done to handle JBIG2 files but I can't find that anywhere.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)