You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by GitBox <gi...@apache.org> on 2019/11/12 07:09:44 UTC

[GitHub] [commons-compress] PeterAlfreadLee opened a new pull request #86: COMPRESS-477 building a split zip

PeterAlfreadLee opened a new pull request #86: COMPRESS-477 building a split zip
URL: https://github.com/apache/commons-compress/pull/86
 
 
   [COMPRESS-477](https://issues.apache.org/jira/projects/COMPRESS/issues/COMPRESS-477)
   Add support for building a split/spanned zip.
   
   Sample code:
   ```
   @Test
   public void buildSplitZipTest() throws IOException {
   	File directoryToZip = getFilesToZip();
   	File outputZipFile = new File(dir, "splitZip.zip");
   	long splitSize = 100 * 1024L; /* 100 KB */
   	final ZipArchiveOutputStream zipArchiveOutputStream = new ZipArchiveOutputStream(outputZipFile, splitSize);
   
   	addFilesToZip(zipArchiveOutputStream, directoryToZip);
   	zipArchiveOutputStream.close();
   	// TODO: validate the created zip files when extracting split zip is merged into master
   }
   
   private void addFilesToZip(ZipArchiveOutputStream zipArchiveOutputStream, File fileToAdd) throws IOException {
   	if(fileToAdd.isDirectory()) {
   		for(File file : fileToAdd.listFiles()) {
   			addFilesToZip(zipArchiveOutputStream, file);
   		}
   	} else {
   		ZipArchiveEntry zipArchiveEntry = new ZipArchiveEntry(fileToAdd.getPath());
   		zipArchiveEntry.setMethod(ZipEntry.DEFLATED);
   
   		zipArchiveOutputStream.putArchiveEntry(zipArchiveEntry);
   		IOUtils.copy(new FileInputStream(fileToAdd), zipArchiveOutputStream);
   		zipArchiveOutputStream.closeArchiveEntry();
   	}
   }
   ```
   
   This PR is implemented by adding a new class `ZipSplitOutputStream`, and it's mainly implemented like this:
   1. Write the zip split signature to the zip file in the constructor of `ZipSplitOutputStream` by calling `writeZipSplitSignature`;
   2. Based on the zip specification, the split size must between 64K and 4,294,967,295 bytes;
   3. Rename the split zip files like .z01, .z02, ... , .z(N-1), .zip ONLY IF there are more than 1 split segment;
   4. Get the only split segment whose suffix is .zip IF the split size is big enough(it means the split size is bigger than the actual zip size);
   5. Create a new zip split segment if the size of data to write exceeds split size, and the newly created zip segment will be named in the sequence like .z01, .z02, ..., .z99, .z100, .z101, ... , .zip;
   6. Based on the zip specification, the End Of Central Directory(EOCD) and Zip64 End Of Central Directory Locator(Zip64_EOCDL) must reside on the same segment, so the `ZipSplitOutputStream` will create a new segment if the remaining size is not enough before writing EOCD and Zip64_EOCDL;
   7. When creating `ZipArchiveOutputStream`, if the split size is specified, it will create a split zip instead of normal zip(as the `ZipSplitOutputStream` need the file name when creating new split segments, the constructor is like `public ZipArchiveOutputStream(final File file, final long zipSplitSize)`);
   8. The disk number, relative offset, number of this disk, number of Central Directories on this disk, total number of disks in Central Directory, Zip64 End Of Central Directory, Zip64 End Of Central Directory Locator, End Of Central Directory have all been tuned to the right value when writing a split/spanned zip;
   9. The testcases need to be updated when [#84]{https://github.com/apache/commons-compress/pull/84} is merged because it seems I can not test my created split/spanned zip in Linux. I have tested it on Windows and it works well;
   10. This PR has some minor conflicts with [#84]{https://github.com/apache/commons-compress/pull/84}, and I will solve all these conflicts as soon as [#84]{https://github.com/apache/commons-compress/pull/84} is merged.
   
   Please feel free to let me know if the code need to be refactored or rebased. I'm looking for your reviews. :-) @bodewig @garydgregory 
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services