You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Yahav Amsalem (Jira)" <ji...@apache.org> on 2021/01/02 20:45:00 UTC

[jira] [Created] (TIKA-3257) RAR files extracted content is not separated from the inner file names

Yahav Amsalem created TIKA-3257:
-----------------------------------

             Summary: RAR files extracted content is not separated from the inner file names
                 Key: TIKA-3257
                 URL: https://issues.apache.org/jira/browse/TIKA-3257
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.23
            Reporter: Yahav Amsalem
         Attachments: test.rar

Attached is a RAR file containing a PPT file ("test.ppt") with one line in it - "Here the PPT content starts".

However, the extracted text from tika is *not separating the file name and its content* as follows:

"test.pptHere the PPT content starts"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)