You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@commons.apache.org by Marian Schedenig <ms...@gmx.net> on 2009/04/18 01:50:52 UTC

Compress: Reading archives within archives

Hi!

I'm using Commons Compress to generate a list of all file names within an
archive. This should also parse archive file within the archive to get a
list of all files.

However, I can't quite read the inner archive. For the outer archive, I have
a file input stream "in", possibly a compressor input stream "cin", and an
archive input stream "ain". Now whenever the next archive entry from ain
turns out to be an archive file, I have to do create a new (possibly)
compressor input stream and (definitely) archive input stream for the sub
archive. Depending on which input stream I pass to the factory, I get two
different errors:

1) Pass archive input stream "ain" to the factory:
java.lang.IllegalArgumentException: Mark is not supported.
	at
org.apache.commons.compress.archivers.ArchiveStreamFactory.createArchiveInputStream(ArchiveStreamFactory.java:152)

2) Pass file input stream "in" to the factory:
This actually manages to get the file name, size and modification date of
the first file within the sub archive (at least for ZIP files). However, I
then get this exception:

java.util.zip.ZipException: oversubscribed dynamic bit lengths tree
	at
org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.read(ZipArchiveInputStream.java:236)

And the input stream is broken for the archive input stream of the outer
archive.

Am I doing something wrong? Is this a bug? Or is it at this time simply not
possible to create archive input streams from "live" archive input streams
without first decompressing the inner archive to a temp file?

Thx,
Marian.

-- 
View this message in context: http://www.nabble.com/Compress%3A-Reading-archives-within-archives-tp23107917p23107917.html
Sent from the Commons - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Re: Compress: Reading archives within archives

Posted by Bear Giles <bg...@coyotesong.com>.
Marian Schedenig wrote:
> Very interesting usecase, I didn't think about nested archives.
>   
Virus scanners and the like will check nested archives.  Hardcore 
scanners can check files that you don't normally think of as archives, 
but which allow arbitrary information to be embedded within them.  Image 
files, executable objects, etc.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Re: Compress: Reading archives within archives

Posted by Marian Schedenig <ms...@gmx.net>.

Christian Grobmeier wrote:
> 
> yes, compress was a sandbox component before. Its just graduated - and
> there are some todos left, for example the website. It will be fixed
> with the first official release.

Ah, that's fine then. I was distraught when the downloads didn't work and I
couldn't find any contact info on the website.

Great news about the "promotion", too. :)

Cheers,
Marian.

-- 
View this message in context: http://www.nabble.com/Compress%3A-Reading-archives-within-archives-tp23107917p23122650.html
Sent from the Commons - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Re: Compress: Reading archives within archives

Posted by Christian Grobmeier <gr...@gmail.com>.
Hi,

> I nearly didn't - the nightly build directory linked to on the website only
> shows md5 files, and the SVN link on the main page gives a 404 error and a
> Python traceback. Fortunately, I discovered a working SVN link in the Wiki,
> as Compress does exactly what I need (zip, tar.gz and bzip2).

yes, compress was a sandbox component before. Its just graduated - and
there are some todos left, for example the website. It will be fixed
with the first official release.

> With buffering, I now get exactly one of the "oversubscribed dynamic bit
> lengths tree" messages, but this seems to be related to a password-protected
> ZIP file within my main ZIP. Compress manages to read the first entry
> (ArchiveEntry, that is... I'm not trying to get the content from these
> files, and with an encrypted ZIP, that obviously couldn't work) from this
> ZIP before giving this exception. But all the remaining ZIPs from the same
> main ZIP are parsed correctly, so I think everything's working correctly
> now.
>
> Thanks a lot!

Thanks too!
I added a testcase for this,
http://svn.apache.org/viewvc/commons/proper/compress/trunk/src/test/java/org/apache/commons/compress/archivers/ZipTestCase.java?view=markup&pathrev=766447

Cheers,
Christian

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Re: Compress: Reading archives within archives

Posted by Marian Schedenig <ms...@gmx.net>.

Christian Grobmeier wrote:
> 
> thanks for using compress!
> 

I nearly didn't - the nightly build directory linked to on the website only
shows md5 files, and the SVN link on the main page gives a 404 error and a
Python traceback. Fortunately, I discovered a working SVN link in the Wiki,
as Compress does exactly what I need (zip, tar.gz and bzip2).


Very interesting usecase, I didn't think about nested archives.

Neither did I, until I stumbled about the first such case in my own backups.
;)


this happens, if you use the autodetect feature from the factories.
> Please wrap your inputstream in a bufferedinputstream (or anything
> else which supports mark - i don't know others :-)) and the error
> should dissappear

It does! I had the main stream (the FileInputStream) wrapped in a
BufferedInputStream, but not the ArchiveInputStreams I got from it.

With buffering, I now get exactly one of the "oversubscribed dynamic bit
lengths tree" messages, but this seems to be related to a password-protected
ZIP file within my main ZIP. Compress manages to read the first entry
(ArchiveEntry, that is... I'm not trying to get the content from these
files, and with an encrypted ZIP, that obviously couldn't work) from this
ZIP before giving this exception. But all the remaining ZIPs from the same
main ZIP are parsed correctly, so I think everything's working correctly
now.

Thanks a lot!

Cheers,
Marian.

-- 
View this message in context: http://www.nabble.com/Compress%3A-Reading-archives-within-archives-tp23107917p23118607.html
Sent from the Commons - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Re: Compress: Reading archives within archives

Posted by Christian Grobmeier <gr...@gmail.com>.
Hi Marian,

thanks for using compress!

> I'm using Commons Compress to generate a list of all file names within an
> archive. This should also parse archive file within the archive to get a
> list of all files.
>
> However, I can't quite read the inner archive. For the outer archive, I have
> a file input stream "in", possibly a compressor input stream "cin", and an
> archive input stream "ain". Now whenever the next archive entry from ain
> turns out to be an archive file, I have to do create a new (possibly)
> compressor input stream and (definitely) archive input stream for the sub
> archive. Depending on which input stream I pass to the factory, I get two
> different errors:

Very interesting usecase, I didn't think about nested archives.

> 1) Pass archive input stream "ain" to the factory:
> java.lang.IllegalArgumentException: Mark is not supported.
>        at

this happens, if you use the autodetect feature from the factories.
Please wrap your inputstream in a bufferedinputstream (or anything
else which supports mark - i don't know others :-)) and the error
should dissappear


> 2) Pass file input stream "in" to the factory:
> This actually manages to get the file name, size and modification date of
> the first file within the sub archive (at least for ZIP files). However, I
> then get this exception:
>
> java.util.zip.ZipException: oversubscribed dynamic bit lengths tree
>        at
> org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.read(ZipArchiveInputStream.java:236)
>
> And the input stream is broken for the archive input stream of the outer
> archive.

I have to check this out myself - but I assume this happens cause of
our random file access we use at the Zip classes. Means we read the
central directory of the zip archive first. This is only possible if
we have the file completly. I am guessing this is the problem here,
but maybe Stefan (who did tons of work on the zip classes) has more
ideas.


> Am I doing something wrong? Is this a bug? Or is it at this time simply not
> possible to create archive input streams from "live" archive input streams
> without first decompressing the inner archive to a temp file?

I am quite sure its possible if you use option 1 with an
bufferedinputstream - let me know if that works.

Thanks,
Christian

>
> Thx,
> Marian.
>
> --
> View this message in context: http://www.nabble.com/Compress%3A-Reading-archives-within-archives-tp23107917p23107917.html
> Sent from the Commons - User mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> For additional commands, e-mail: user-help@commons.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org