You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@commons.apache.org by Tim Allison <ta...@apache.org> on 2020/06/11 12:19:19 UTC
[COMPRESS] tar files and missing bytes?
All,
We recently made TikaInputStream's skip() inherently strict so that it
throws an EOF if a parser tries to skip past the end of a file. We didn't
notice any problems in our regression tests (aside from some likely
truncated mp4s), but we recently got an issue [1] from a user where this is
a problem for a tar file created by 7z [2].
Is this a valid tar, or are we right to throw an EOF?
Thank you.
Best,
Tim
[1] https://issues.apache.org/jira/browse/TIKA-3110
[2]
https://github.com/AlexOkayJ/apache-tika-tar-issue/blob/master/src/main/resources/7ztar.tar
Re: [COMPRESS] tar files and missing bytes?
Posted by Stefan Bodewig <bo...@apache.org>.
On 2020-06-11, Tim Allison wrote:
> We recently made TikaInputStream's skip() inherently strict so that it
> throws an EOF if a parser tries to skip past the end of a file. We didn't
> notice any problems in our regression tests (aside from some likely
> truncated mp4s), but we recently got an issue [1] from a user where this is
> a problem for a tar file created by 7z [2].
> Is this a valid tar, or are we right to throw an EOF?
Yes, it is, unfortunately. It somewhat depends on what you consider
"valid".
I saw the mail about the TIKA issue before I found this mail, see
https://issues.apache.org/jira/browse/TIKA-3110?focusedCommentId=17134328&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17134328
Stefan
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org