You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Sam Stephens (Jira)" <ji...@apache.org> on 2022/04/01 03:28:00 UTC

[jira] [Updated] (TIKA-3711) Image file names included in parsed Word Document text

     [ https://issues.apache.org/jira/browse/TIKA-3711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sam Stephens updated TIKA-3711:
-------------------------------
    Description: 
The attached Word document includes nothing but a single image. Running it through the Tika 2.2.0 AutoDetectParser correctly returns null. Running it through the Tika 2.3.0 AutoDetectParser returns the text:

{{image1.png}}

 

  was:
The attached Word document includes nothing but a single image. Running it through the Tika 2.2.0 AutoDetectParser correctly returns no text. Running it through the Tika 2.3.0 AutoDetectParser returns the text:


{{image1.png}}

 


> Image file names included in parsed Word Document text
> ------------------------------------------------------
>
>                 Key: TIKA-3711
>                 URL: https://issues.apache.org/jira/browse/TIKA-3711
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 2.3.0
>            Reporter: Sam Stephens
>            Priority: Major
>         Attachments: word-doc-with-image.docx
>
>
> The attached Word document includes nothing but a single image. Running it through the Tika 2.2.0 AutoDetectParser correctly returns null. Running it through the Tika 2.3.0 AutoDetectParser returns the text:
> {{image1.png}}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)