You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Mark Kerzner <ma...@gmail.com> on 2010/07/19 19:18:15 UTC

HDFS InputStream and ZipFiles

Hi,

I want to pass a comment with my ZipEntry. I can put the comment in all
right. However, when I read the comment from the ZipEntry back, it does not
work if you use ZipInputStream. The comment is only read if you use
ZipFile.

On the other hand, HDFS FileSystem insists on using streams. I could copy
the zip file from HDFS to local, but other than that, is there a way to use
ZipFile with HDFS?

Thank you,
Mark

Re: HDFS InputStream and ZipFiles

Posted by "Ankur C. Goel" <ga...@yahoo-inc.com>.
Java's ZipFile does not work off an input stream so it cannot be used with HDFS.
ZipInputStream can work with HDFS but its utility is limited by the fact that one cannot seek to random zip (for distributed processing) entries as in zipfile.
Also Java's ZipFile implementation does not work on files > 4 GB.

There's a JIRA for this - https://issues.apache.org/jira/browse/MAPREDUCE-210



On 7/19/10 10:48 PM, "Mark Kerzner" <ma...@gmail.com> wrote:

Hi,

I want to pass a comment with my ZipEntry. I can put the comment in all
right. However, when I read the comment from the ZipEntry back, it does not
work if you use ZipInputStream. The comment is only read if you use
ZipFile.

On the other hand, HDFS FileSystem insists on using streams. I could copy
the zip file from HDFS to local, but other than that, is there a way to use
ZipFile with HDFS?

Thank you,
Mark