You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by Apache Wiki <wi...@apache.org> on 2015/04/05 23:25:15 UTC

[Tomcat Wiki] Update of "Development/NestedFilesystem" by jboynes

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tomcat Wiki" for change notification.

The "Development/NestedFilesystem" page has been changed by jboynes:
https://wiki.apache.org/tomcat/Development/NestedFilesystem

New page:
Java uses the JAR file format for packaging application components. Originally used simply for packaging classes and their associated resources, it is now used for package types that allow embedding of other packages such as:

 * Web applications (WAR files) that may contain JAR files with classes and/or web fragments
 * Resource adapters (J2CA RAR files) that may contain JAR files with classes or native libraries
 * Enterprise archives (EAR files) that may contain JAR files, WARs or RARs (with their embedded JARs and libraries)

This nesting is typically handled by expanding the packages onto the filesystem where they can be accessed using the standard JDK APIs; however, this requires a writable filesystem with space to hold the extracted packages and takes time to perform the extractions. This has the advantage that every resource contained in the package can be identified by a URL using a scheme supported directly by the JDK (using either the "file" protocol or the "jar" protocol).

To avoid unpacking the archive, alternative mechanisms have been build that use custom URLs and !ClassLoader implementations to access their content. Examples of these are the "jndi" scheme used in previous versions of Tomcat or the "onejar" scheme used by the One-Jar project. These custom schemes may not be recognized by framework libraries and may be handled incorrectly or inefficiently. This is compounded by schemes deriving from the "jar" scheme with its use of non-hierarchical URIs that require special handling.

This proposal explores an alternative implementation based on the use of the NIO !FileSystem library introduced in Java 7.

A prototype implementation is available in Tomcat's sandbox at http://svn.apache.org/viewvc/tomcat/sandbox/niofs/

= Requirements =

The design is predicated on the ability to create !FileSystem to provide a fully-functional view of an archive's content from a !Path referring to an archive. !Paths to entries in that !FileSystem may be used as the basis for other archive !FileSystems. Essentially, an archive can be mounted as a !FileSystem and any archives it contains can in turn be mounted to form a nested hierarchy of !FileSystems.

== Functional Requirements ==
 * A !FileSystem view of an archive may be created by calling the newFileSystem(Path) method on the provider.
   * The !FileSystem underlying the Path must support random access via the !SeekableByteChannel returned from newByteChannel()
 * The provider's newByteChannel() operation must return a !SeekableByteChannel that supports random access
 * A !FileSystem view of an archive may be created by calling the newFileSystem(URI) method on the provider.
   * The URI must be able to be converted to a Path using the Paths.get(URI) API.
   * The !FileSystem backing such a Path must meet the constraints defined for newFileSystem(Path)
 * The URIs for Paths returned by the provider must use standard URI syntax and support resolving of relative references

== Non-Functional Requirements ==
 * The provider will be identified by the URI scheme "archive"
 * The provider should avoid unnecessary buffering of data in memory or on disk
   * Buffering modes should be configurable by the user
 * Performance should be comparable to that achievable by extracting the archive to disk
   * Mount performance should be comparable to the time and resources taken to extract the archive's content
   * File open performance should be comparable to the time taken to open a file on the default filesystem
   * File read performance should be comparable to the time taken to read from a file on the default filesystem
   * File seek performance should be comparable to the time taken to position within a file on the default filesystem

= Implementation =

== Zip Structure ==
PKWARE's documentation on the format can be found at http://www.pkware.com/documents/casestudies/APPNOTE.TXT

A Zip file is organized as a series of file entries each consisting of a header followed by data, followed by a series of "central directory" entries that reference the individual file entries, followed by a "end of central directory" or EOCD record that can be used to reference the central directory. An application wishing to access a random entry must work backwards from the end of the file to locate the EOCD record, seek to and scan the central directory entries, then seek to the individual file entry.

Individual file entries may be uncompressed (i.e STORED) or compressed using the DEFLATE algorithm (although the Zip format allows others the JDK only supports DEFLATE). Data in STORED entries may be accessed directly once the entry's offset within the archive has been retrieved from the central directory entry. However, DEFLATE stores data as a series of blocks of unknown length so positioning within a deflated entry may involve following the block chain from the beginning.

Zip files may or may may not contain entries corresponding to folders in the filesystem. This is typically transparent to applications using a !ClassLoader to load classes or resources but to provide a !FileSystem view these nodes must be synthesized if not present.

Zip files may contain "zombie" entries that are not located in the central directory. These can be created when a zip file is updated to replace or remove additional items. An application that sequentially scans a Zip file may incorrectly handle this (returning the older or deleted entry) unless it continues to scan the entire jar to verify an entry still appears in the central directory; due to the inherent inefficiency in that most do not. In practice, application packages are generally not modified after initial build so this error is unlikely.

Zip files may contain data in addition to the archive entries such as executable code for self-extracting archives or text comments describing the archive.

== URI Structure ==



== ToDos ==
= Limitations in standard JDK APIs =

== Zip Handling ==

The JDK API dealing with Zip archives have not been updated to work with the NIO File APIs:
 * ZipFile's constructor only accepts a java.io.File or a String relating to a file on the default filesystem
 * A zip entry may only be accessed as a sequential !InputStream rather than a !SeekableByteChannel
 * A !ZipInputStream may only be constructed over an !InputStream rather than a !SeekableByteChannel

The JDK implementation of Zip support uses the native zlib library and maps the archive into memory for direct access and performance. This has implications:
 * The archive content must be accessible from native code
 * Memory mapping a file on some operating systems (e.g. Microsoft Windows) asserts a mandatory file lock which interferes with the "overwrite to re-deploy" mechanism often used in development environments

== URL Support ==

The jar scheme syntax is now [[https://www.iana.org/assignments/uri-schemes/prov/jar|formally defined]] as:
{{{
jar:<url>!/[<entry>]
}}}

The JDK libraries such as JarURLConnection do not permit the <url> component to be another jar: URL; nesting is specifically not supported.

As this does not comply with the syntax rules for standard hierarchical URIs custom parsing code is required in order to perform URL manipulation. For example, to resolve a relative URI such as a class reference, the jar: URL must be parsed to extract and manipulate the [entry] component.

JarURLConnection's getJarFile API returns a !JarFile which has the same issues described in [[#Zip Handling]].

== Built-in "jar" FileSystemProvider ==
To provide an illustrative example of a !FileSystemProvider, Sun/Oracle released a demo "ZipFS" for working with Zip archives and a version of this is included in the JDK. This implementation inherits some of the limitations from above:
 * The archive must be located on the default !FileSystem
 * It uses "jar:" URIs and does not support nesting
 * The !SeekableByteChannel returned by newByteChannel does not support seek operations

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: [Tomcat Wiki] Update of "Development/NestedFilesystem" by jboynes

Posted by Jeremy Boynes <jb...@apache.org>.
> On Apr 5, 2015, at 2:25 PM, Apache Wiki <wi...@apache.org> wrote:
> 
> Dear Wiki user,
> 
> You have subscribed to a wiki page or wiki category on "Tomcat Wiki" for change notification.
> 
> The "Development/NestedFilesystem" page has been changed by jboynes:
> https://wiki.apache.org/tomcat/Development/NestedFilesystem

I started capturing thoughts on the NIO FS in this page but still have a bit more to add.