You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2022/06/23 19:12:00 UTC
[jira] [Created] (TIKA-3800) Consider wrapping 'unrar' commandline executable as a parser to handle rar v5
Tim Allison created TIKA-3800:
---------------------------------
Summary: Consider wrapping 'unrar' commandline executable as a parser to handle rar v5
Key: TIKA-3800
URL: https://issues.apache.org/jira/browse/TIKA-3800
Project: Tika
Issue Type: Task
Reporter: Tim Allison
Junrar is great and doesn't require any external dependencies. However, it doesn't handle rar v5. I've tried {{UNRAR 5.61 beta 1 freeware}} on some of the v5 files that we have in our regression corpus, and I can confirm that Tika is not able to handle them, but unrar is.
The parser would need to create a temporary directory, copy the inputstream there to a file, run unrar, process the extracted files and then clean up the directory.
We can get full path information from the {{l}} command: {{unrar l blah.rar}}
We can tell unrar not to overwrite files with the same name: {{unrar e or bug_trackers/LIBRE_OFFICE/131138-137877/LIBRE_OFFICE-135119-0.rar}}.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)