You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@commons.apache.org by Tim Allison <ta...@apache.org> on 2020/06/11 12:19:19 UTC

[COMPRESS] tar files and missing bytes?

All,
  We recently made TikaInputStream's skip() inherently strict so that it
throws an EOF if a parser tries to skip past the end of a file.  We didn't
notice any problems in our regression tests (aside from some likely
truncated mp4s), but we recently got an issue [1] from a user where this is
a problem for a tar file created by 7z [2].
  Is this a valid tar, or are we right to throw an EOF?

         Thank you.

                   Best,

                       Tim

[1] https://issues.apache.org/jira/browse/TIKA-3110
[2]
https://github.com/AlexOkayJ/apache-tika-tar-issue/blob/master/src/main/resources/7ztar.tar

Re: [COMPRESS] tar files and missing bytes?

Posted by Stefan Bodewig <bo...@apache.org>.
On 2020-06-11, Tim Allison wrote:

>   We recently made TikaInputStream's skip() inherently strict so that it
> throws an EOF if a parser tries to skip past the end of a file.  We didn't
> notice any problems in our regression tests (aside from some likely
> truncated mp4s), but we recently got an issue [1] from a user where this is
> a problem for a tar file created by 7z [2].

>   Is this a valid tar, or are we right to throw an EOF?

Yes, it is, unfortunately. It somewhat depends on what you consider
"valid".

I saw the mail about the TIKA issue before I found this mail, see
https://issues.apache.org/jira/browse/TIKA-3110?focusedCommentId=17134328&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17134328

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org