You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Stefan Bodewig <bo...@apache.org> on 2013/12/22 08:45:23 UTC

[compress] Tika test error with IMPLODEd zip

Hi

even thouhg Gump stopped nagging, it still works.  Tika is built by Gump
and they have a test they call testUnsupportedZipCompressionMethod in
which they seem to use an archive created with an IMPLODED entry.

This test started to fail when Emmanuel added support for IMPLODE but
unfortunately it doesn't fail because we could now read the archive but
rather because of an IndexOutOfBounds in BinaryTree 

<http://vmgump.apache.org/gump/public/tika/tika-parsers-test/gump_file/org.apache.tika.parser.pkg.ZipParserTest.txt.html>

Of course it is possible the archive isn't real and they used a
hex-editor to have an unparseable archive - but it may also be a sign we
have a bug in our code.

I'll try to locate the archive they use later and see whether it looks
genuine but maybe somebody from tika is lurking here and can shed some
light.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] Tika test error with IMPLODEd zip

Posted by Emmanuel Bourg <eb...@apache.org>.
Le 22/12/2013 12:40, Stefan Bodewig a écrit :

> The archive is real, InfoZIP's unzip is willing to extract it without
> any errors.  I've created unit tests from it with svn revision 1552980 -
> not sure when I'll find time to analyze the error.

Thank you Stefan, I'll look into it.

Emmanuel Bourg


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] Tika test error with IMPLODEd zip

Posted by Stefan Bodewig <bo...@apache.org>.
On 2013-12-22, Emmanuel Bourg wrote:

> Le 22/12/2013 12:40, Stefan Bodewig a écrit :

>> The archive is real, InfoZIP's unzip is willing to extract it without
>> any errors.  I've created unit tests from it with svn revision 1552980 -
>> not sure when I'll find time to analyze the error.

> The issue is now fixed. I used a short instead of an int as an index of
> the binary tree, but that wasn't wide enough to work with the biggest
> possible trees.

Thanks!

        Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] Tika test error with IMPLODEd zip

Posted by Emmanuel Bourg <eb...@apache.org>.
Le 22/12/2013 12:40, Stefan Bodewig a écrit :

> The archive is real, InfoZIP's unzip is willing to extract it without
> any errors.  I've created unit tests from it with svn revision 1552980 -
> not sure when I'll find time to analyze the error.

The issue is now fixed. I used a short instead of an int as an index of
the binary tree, but that wasn't wide enough to work with the biggest
possible trees.

Thank you Gump for spotting this early! :)

Emmanuel Bourg


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] Tika test error with IMPLODEd zip

Posted by Stefan Bodewig <bo...@apache.org>.
On 2013-12-22, Stefan Bodewig wrote:

> I'll try to locate the archive they use later and see whether it looks
> genuine but maybe somebody from tika is lurking here and can shed some
> light.

The archive is real, InfoZIP's unzip is willing to extract it without
any errors.  I've created unit tests from it with svn revision 1552980 -
not sure when I'll find time to analyze the error.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] Tika test error with IMPLODEd zip

Posted by Stefan Bodewig <bo...@apache.org>.
On 2013-12-22, sebb wrote:

> On 22 December 2013 07:45, Stefan Bodewig <bo...@apache.org> wrote:

>> <http://vmgump.apache.org/gump/public/tika/tika-parsers-test/gump_file/org.apache.tika.parser.pkg.ZipParserTest.txt.html>

>> Of course it is possible the archive isn't real and they used a
>> hex-editor to have an unparseable archive - but it may also be a sign we
>> have a bug in our code.

> Seems to me that the code should still not fail with exceptions such
> as IOOB (or NPE etc) no matter what is thrown at it.
> It should really detect the malformation and report it with a more
> suitable exception.

Yes, that's true.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [compress] Tika test error with IMPLODEd zip

Posted by sebb <se...@gmail.com>.
On 22 December 2013 07:45, Stefan Bodewig <bo...@apache.org> wrote:
> Hi
>
> even thouhg Gump stopped nagging, it still works.  Tika is built by Gump
> and they have a test they call testUnsupportedZipCompressionMethod in
> which they seem to use an archive created with an IMPLODED entry.
>
> This test started to fail when Emmanuel added support for IMPLODE but
> unfortunately it doesn't fail because we could now read the archive but
> rather because of an IndexOutOfBounds in BinaryTree
>
> <http://vmgump.apache.org/gump/public/tika/tika-parsers-test/gump_file/org.apache.tika.parser.pkg.ZipParserTest.txt.html>
>
> Of course it is possible the archive isn't real and they used a
> hex-editor to have an unparseable archive - but it may also be a sign we
> have a bug in our code.

Seems to me that the code should still not fail with exceptions such
as IOOB (or NPE etc) no matter what is thrown at it.
It should really detect the malformation and report it with a more
suitable exception.

> I'll try to locate the archive they use later and see whether it looks
> genuine but maybe somebody from tika is lurking here and can shed some
> light.
>
> Stefan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org