You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by GitBox <gi...@apache.org> on 2021/01/22 23:15:02 UTC

[GitHub] [incubator-daffodil] bsloane1650 commented on pull request #480: WIP: First cut at a Zip layer transform

bsloane1650 commented on pull request #480:
URL: https://github.com/apache/incubator-daffodil/pull/480#issuecomment-765737882


   Some high level comments:
   
   1) It feels like we should ship schema for our self defined ZIP header type, so the user schema can be something like:
   
   ```
   <xs:element name="zipEntry" maxOccurs="unbounded" dfdl:occursCountKind="implicit">
     <xs:complexType>
       <xs:sequence>
         <xs:element name="header" type="dfdlx:ZipHeader"/>
          ...
       </xs:sequence>
     </xs:complexType>
   </xs:element>
   ```
   
   In addition to being generally easier for the user, this should also allow us to make changes with a bit more backwards compatibility.
   
   It also feels wrong that we are passing the metadata in-band in the first place; but I don't see any way around that without redesigning DFDL somewhat.
   
   2) Silently dropping deleted entries seems like a limitation to me as well. From a DFDL interface standpoint supporting deleted entries is straightforward: just add a boolean field to the entry header indicating if the file was deleted. From an implementation standpoint, this probably means we would need to roll our own ZIP implementation.
   Not something we need to address, but worth explicitly listing as a limitation.
   
   3) Similar to (2), we silently drop dead space from the ZIP file. Probably less even less important than (2), but still a limitation if anyone wants to use this for forensic purposes.
   
   4) If I understand the ZIP format correctly, there is a global comment in the that applies to the entire ZIP which we do not expose. We may want to include a global ZIP header before the first entry header.
   
   5) A maliciously crafted ZIP file might be able to cause all sorts of interesting issues.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org