You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by GitBox <gi...@apache.org> on 2019/11/25 01:40:34 UTC

[GitHub] [commons-compress] PeterAlfreadLee opened a new pull request #87: COMPRESS-124 : Add support for extracting sparse entries from tar archives

PeterAlfreadLee opened a new pull request #87: COMPRESS-124 : Add support for extracting sparse entries from tar archives
URL: https://github.com/apache/commons-compress/pull/87
 
 
   [COMPRESS-124](https://issues.apache.org/jira/browse/COMPRESS-124)
   Add support for extracting sparse entries from tar archives, including :
   
   1. Old GNU Format
   The sparse map of old GNU is stored in tar header with only 4 sparse headers. If more spar headers exist, they should be stored in the extension block
   
   2. PAX 0.0 Format
   The sparse map of PAX 0.0 Format is stored in tar headers with `GNU.sparse.offset` and `GNU.sparse.numbytes`. They may appear more than 1 time, so they cannot be stored in a map.
   
   3. PAX 0.1 Format
   The sparse map of PAX 0.1 Format is stored in tar headers with `GNU.sparse.map`.
   
   3. PAX 1.0 Format
   The sparse map of PAX 1.0 Format is stored in the head of data file block. Therefore we need to handle it separately when we encounted a PAX 1.0 tar.
   
   The implemention is mainly implemented by the TarArchiveSparseInputStream. It returns 0 when reading the "holes", and the file data from original tar file input stream when reading the sparse content. These are all decided by the sparse headers. The TarArchiveSparseInputStream reads all sparse headers, creates all-zero input streams surrounded by BoundedInputStream and actual tar file input stream surrounded by BoundedInputStream, and combines them together to a single TarArchiveSparseInputStream. Then we can straightly read from TarArchiveSparseInputStream when we encounted a sparse tar entry.
   
   TODO : I notices in the code that there are some kind of "star sparse data". I didn't find it in [GNU Tar Manual](https://www.gnu.org/software/tar/manual), so I don't know how to parse such kind of tars.
   
   Please let me know if something need to be modified. :-) @bodewig 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services