You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tika User (Jira)" <ji...@apache.org> on 2022/08/03 12:17:00 UTC

[jira] [Commented] (TIKA-3827) Word Document extracted mpga file extension instead of bitmap

    [ https://issues.apache.org/jira/browse/TIKA-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17574695#comment-17574695 ] 

Tika User commented on TIKA-3827:
---------------------------------

Its file type is reading it as RF and while extracting the content itself the embedded file contains two file with .mpga extension.

> Word Document extracted mpga file extension instead of bitmap 
> --------------------------------------------------------------
>
>                 Key: TIKA-3827
>                 URL: https://issues.apache.org/jira/browse/TIKA-3827
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Tika User
>            Priority: Major
>         Attachments: example.DOC
>
>
> When tried to parser the .doc document it is extracted two mpga files which can't be open to play. We are suspecting they should be bitmap image files. The Tika version we are using is 2.4.1.
> [^example.DOC]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)