You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Yahav Amsalem (Jira)" <ji...@apache.org> on 2021/01/02 20:45:00 UTC
[jira] [Created] (TIKA-3257) RAR files extracted content is not
separated from the inner file names
Yahav Amsalem created TIKA-3257:
-----------------------------------
Summary: RAR files extracted content is not separated from the inner file names
Key: TIKA-3257
URL: https://issues.apache.org/jira/browse/TIKA-3257
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.23
Reporter: Yahav Amsalem
Attachments: test.rar
Attached is a RAR file containing a PPT file ("test.ppt") with one line in it - "Here the PPT content starts".
However, the extracted text from tika is *not separating the file name and its content* as follows:
"test.pptHere the PPT content starts"
--
This message was sent by Atlassian Jira
(v8.3.4#803005)