You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Brett Henderson <br...@mail15.com> on 2003/11/03 00:46:48 UTC

Streamable Codec Framework

Hi All,

I noticed Alexander Hvostov's recent email containing streamable
base64 codecs.  Given that the current codec implementations are
oriented around in-memory buffers, is there room for an
alternative codec framework supporting stream functionality?  I
realise the need for streamable codecs may not be that great but
it does seem like a gap in the current library.

I have done some work in this area over the last couple of months
as a small hobby project and have produced a small framework for
streamable codecs.

Some of the goals I was working towards were:
1. No memory allocation during streaming.  This eliminates
garbage collection during large conversions.
2. Pipelineable codecs.  This allows multiple codecs to be chained
together and treated as a single codec.  This allows codecs such as
base 64 to be broken into two components (base64 and line wrapping
codecs).
2. Single OutputStream, InputStream implementations which
utilise codec engines internally.  This eliminates the need to
produce a buffer based engine and a stream engine for every codec.
Note that this requires codec engines to be written in a manner
that supports streaming.
3. Customisable receivers.  All codecs utilise receivers to
handle conversion results.  This allows different outputs such as
streams, in-memory buffers, etc to be supported.
4. Direction agnostic codecs.  Decoupling the engine from the
streams allows the engines to be used in different ways than
originally intended.  Ie. You can perform base64 encoding
during reads from an InputStream.

I have produced base64 and ascii hex codecs as a proof of concept
and to evaluate performance.  It isn't as fast as the current
buffer based codecs but is unlikely to ever be as fast due to the
extra overheads associated with streaming.
Both base64 and ascii hex implementations can produce a data rate
of approximately 40MB/sec on a Pentium Mobile 1.5GHz notebook.
With some performance tuning I'm sure this could be improved,
I think array bounds checking is the largest performance hit.

Currently requires jdk1.4 (exception handling requires rework
for jdk1.3).
Running ant without arguments in the root directory will build
the project, run all unit tests and run performance tests.  Note
that the tests require junit to be available within ant.

Javadocs are the only documentation at the moment.

Files can be found at:
http://www32.brinkster.com/bretthenderson/BHCodec-0.2.zip

I hope someone finds this useful.  I'm not trying to force my
implementation on anybody and I'm sure it could be improved in
many ways.  I'm simply putting it forward as an optional approach.
If it is decided that streamable codecs are a useful addition to
commons I'd be glad to help.

Cheers,
Brett

PS.  Some areas that currently need improving are:
1. Exception handling requires jdk1.4, should be rewritten to
support older java versions.
2. BufferReceiver allocates memory continuously during streamed
conversions, should be fixed to recycle memory buffers.
3. Engines should have a new flush method added to allow them
to hold off posting to receivers until their internal buffers
fill up.  This would prevent fragmented buffers during
pipelined conversions.
4. OutputStream flush needs rework, shouldn't call finalize,
should call new flush method on CodecEngines.


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


RE: [codec] Streamable Codec Framework

Posted by Brett Henderson <br...@mail15.com>.
> >  Pipelineable codecs.  This allows multiple codecs to be chained  
> > together and treated as a single codec.  This allows codecs 
> such  as 
> > base 64 to be broken into two components (base64 and line  wrapping 
> > codecs).
> 
> We have something similar in James were we handle 
> dot-stuffing (actually, we have several specialized stream 
> transforms).  Are you using FilterInputStream?

Both CodecInputStream and CodecOutputStream use FilterInputStream
and FilterOutputStream respectively.

As I believe you were alluding to, one method for chaining multiple
codecs together is to use multiple CodecXXXStreams together with a
single CodecEngine implementation used by each.

Alternatively the ChainEngine can be used to wrap multiple
CodecEngine instances together into single logical units and the
ChainEngine used in a single CodecXXXStream instance.  For output
streams the difference may only be one of semantics but for input
streams I believe the performance difference would be considerable.
Perhaps this indicates issues with the design of CodecInputStream
but I couldn't think of a more effective way of dealing with
CodecEngine instances in a generic manner.

> >  Single OutputStream, InputStream implementations which  
> utilise codec 
> > engines internally.  This eliminates the  need to produce a buffer 
> > based engine and a stream engine  for every codec.
> 
> Seems like it would be good to support NIO as well.

I don't have experience with NIO (yet another task on my
TOLEARN list ;-), any hints or suggestions on what it would take
to support NIO would be appreciated.

> 
> > http://www32.brinkster.com/bretthenderson/BHCodec-0.2.zip
> 
> I'll take a look when I get back into the office.  From your 
> list of things to be fixed, seems you've a good handle on it.
> 
> The other thing I want is a stream-based regex matcher.  
> Doesn't need to be full Perl; awk would be fine.

This brings up an interesting point.  The current implementation
pays no attention to Strings, only bytes.  Would regular
expressions require String (or StringBuffer) support?  If so
it may require the addition of a new codec engine type that
operates on strings.  This may be necessary for XML manipulation
as well.

Note that if an engine is configured using methods accepting
Strings but still processes byte arrays then the existing model
fits (ie. Regular expressions specified as strings but matching
performed on byte sequences).

I haven't thought any of this through yet though so I'm unsure
how it could fit into the existing design.  I don't want to
add new methods to the existing CodecEngine interface because
it is designed for performing transforms on byte sequences
and nothing else.  I see string manipulation as a related but
separate issue (perhaps requiring addition of new interfaces).


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


RE: Streamable Codec Framework

Posted by "Noel J. Bergman" <no...@devtech.com>.
> Given that the current codec implementations are oriented
> around in-memory buffers, is there room for an alternative
> codec framework supporting stream functionality?  I realise
> the need for streamable codecs may not be that great but
> it does seem like a gap in the current library.

If we had them, they could be quite useful within messaging servers.  Having
to process messages in-memory is often undesirable.

>  No memory allocation during streaming.  This eliminates
>  garbage collection during large conversions.

That's the big win.  :-)

>  Pipelineable codecs.  This allows multiple codecs to be chained
>  together and treated as a single codec.  This allows codecs such
>  as base 64 to be broken into two components (base64 and line
>  wrapping codecs).

We have something similar in James were we handle dot-stuffing (actually, we
have several specialized stream transforms).  Are you using
FilterInputStream?

>  Single OutputStream, InputStream implementations which
>  utilise codec engines internally.  This eliminates the
>  need to produce a buffer based engine and a stream engine
>  for every codec.

Seems like it would be good to support NIO as well.

> http://www32.brinkster.com/bretthenderson/BHCodec-0.2.zip

I'll take a look when I get back into the office.  From your list of things
to be fixed, seems you've a good handle on it.

The other thing I want is a stream-based regex matcher.  Doesn't need to be
full Perl; awk would be fine.

	--- Noel


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org