You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "James Baker (JIRA)" <ji...@apache.org> on 2014/09/23 21:06:34 UTC
[jira] [Comment Edited] (TIKA-1396) Embedded images in PDF
documents
[ https://issues.apache.org/jira/browse/TIKA-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14145261#comment-14145261 ]
James Baker edited comment on TIKA-1396 at 9/23/14 7:05 PM:
------------------------------------------------------------
The attachment tika_images.pdf doesn't have the image extracted - tested with tika-app-1.6.jar with the relevant PDFParser property changed, and also using some bespoke image extraction code (that works for other document types) based on the Alfresco source.
was (Author: james.d.baker):
This PDF file doesn't have the image extracted - tested with tika-app-1.6.jar with the relevant PDFParser property changed, and also using some bespoke image extraction code (that works for other document types) based on the Alfresco source.
> Embedded images in PDF documents
> --------------------------------
>
> Key: TIKA-1396
> URL: https://issues.apache.org/jira/browse/TIKA-1396
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.5
> Environment: *OS:*
> Ubuntu 14.04.1 LTS
> *KERNEL:*
> 3.13.0-33-generic
> gcc version 4.8.2
> *JAVA:*
> java version "1.8.0_11"
> Java(TM) SE Runtime Environment (build 1.8.0_11-b12)
> Java HotSpot(TM) 64-Bit Server VM (build 25.11-b03, mixed mode)
> Reporter: Damiano
> Priority: Critical
> Fix For: 1.6
>
> Attachments: tika_images.pdf
>
>
> Hello!
> I just found a problem with PDF documents that have embedded images.
> Doing:
> java -jar tika-app-1.5.jar --extract tika.pdf
> Tika can not find the image.
> Is this a PDF related problem? Because if i do the same operation with a DOC document Tika finds the image correctly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)