You are viewing a plain text version of this content. The canonical link for it is here.

Posted to fop-dev@xmlgraphics.apache.org by Alexander Kiel <al...@gmx.net> on 2009/09/24 17:53:29 UTC

Best Interface for reading OpenType Files

Hi,

I currently thinking about the interface to use for reading OpenType
files.

There are two possibilities:

 - reading on top of an InputStream or
 - reading on top of a RandomAccessFile or FileChannel.

Currently the implementation in FOP uses the class FontFileReader which
expects an InputStream. But it immediately calls IOUtils.toByteArray(in)
and works on that byte array instead. So it needs to hold the file
completely in memory.

FontBox which is part of PDFBox uses some abstract class called
TTFDataStream with template methods which has two implementations, one
called RAFDataStream which operates on top of a RandomAccessFile and one
called MemoryTTFDataStream which operates on top of a byte array.

I started using pure InputStreams. That means I implemented the whole
OpenType file reading using a hierarchy of FilterInputStreams. At the
lowest level I have a DataInputStream which takes every Inputstream and
provides methods to read the basic data types of OpenType just like
java.io.DataInputStream does for java data types. On top of that, I have
streams that can read some small scale data structures, than streams
which can read whole tables and finally a stream which can read the
whole OpenType file.

To read an OpenType file, all you have to write is:

    InputStream in = ...
    OpenTypeFileInputStream otfIn = new OpenTypeFileInputStream(in);
    OpenTypeFile otf = otfIn.readOpenTypeFile();

In my opinion this system works really good. You can take every
InputStream, the reading is decoupled from the OpenType classes itself
and you can test peaces of OpenType structure using only the individual
streams.

But! My approach has one flaw. I need to seek extensively while reading
an OpenType file. The whole file format consists of headers with offsets
and data structures which one has to read from that offsets.

To get this seeking work with streams, I use mark(), reset() and skip().
My common approach at the beginning of such a structure is to mark, than
read the header and for every part, reset to the start, mark again, skip
to the offset and read the part.

But with this approach I'm ending up to hold the whole file in memory.

To make it worse, this mark(), reset(), skip() interface doesn't support
hierarchical marking. If I seek inside smaller scale structures the mark
position of the larger scale structure is overwritten. I don't think
that it is possible to build hierarchical mark support on top of any
markable InputStream. (Oh look I did it later as I wrote this longish
mail.) I think, one have to reimplement BufferedInputStream holding ones
own byte array. In fact I did this on top of ByteArrayInputStream. The
key problem is that one can't get a position out of an InputStream which
does not surprise as the concept of streams doesn't have a position. 

It is possible to read the parts in offset order. But there are
duplicated offsets (more than one offset pointing to the same part) and
parts that have to go into an array in a semantic order which doesn't
have to be the offset order. So I have to first reorder the offsets to
read the parts in offset order and than I have to reorder the read parts
again to get them back into the semantic order. That said - it is still
possible that the offsets are in fact in the semantic order of the
parts, but the spec doesn't say this.

I don't want to depend on RandomAccessFile or FileChannel, because I
need to be able to test reading of substructures out of byte arrays.
What I need is an Interface from which I can read bytes and which allows
multiple relative seeks. With multiple relative seeks I mean something
like multiple marks. As I wrote this, I implemented such a thing inside
my DataInputStream. There is now a method:

    public SkipHandle mark();

and the SkipHandle class looks like this:

    public class SkipHandle {
        
        private final long relativePos;

        public void skipTo(long offset);
    }

SkipHandle is a non-static inner class of DataInputStream.
DataInputStream counts the bytes read and skipped to get an idea of its
actual position. The SkipHandle gets the actual stream position on
creation so that it is able to skip on DataInputStream relative to its
creation position. If the skip would be negative, SkipHandle resets the
whole stream to the start (on creation of DataInputStream, a normal mark
is set) and skips afterwards.

It works, but I find it a little but ugly. First I have to set a
mark(Integer.MAX_VALUE) on DataInputStream creation, because I want
always be able to reset the whole stream, but I don't have any
information about how many bytes are on the road. Than I have to disable
markSupport on my DataInputStream so that nobody kills my own mark.

But the biggest problem is that DataInputStream has now a non-standard
mark(), skipTo() API. Its not like a normal FilterInputStream anymore.
You can't use normal marking, because it's disabled and you have to
learn this new API instead. 

Streams simply aren't the right API for reading stuff like OpenType
files which require massive seeking. But all the seekable API's are
tight on files. 

The TTFDataStream API of FontBox is completely custom. I would like to
avoid such things. 

So I simply don't know a standard Java API which allows byte reading and
seeking over an arbitrary source and throws IOExceptions on its methods.
What about NIO? I don't see any skipping or seeking on channels.

Any idea is welcome.


Best Regards
Alex
 
-  
e-mail: alexanderkiel@gmx.net
web:    www.alexanderkiel.net