You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@directory.apache.org by ak...@apache.org on 2005/01/15 22:00:52 UTC
svn commit: r125307 - /incubator/directory/asn1/trunk/xdocs/index.xml /incubator/directory/asn1/trunk/xdocs/refactor.xml
Author: akarasulu
Date: Sat Jan 15 13:00:52 2005
New Revision: 125307
URL: http://svn.apache.org/viewcvs?view=rev&rev=125307
Log:
updates to site with docs on how we are going to refactor asn1 to make it better and faster
Added:
incubator/directory/asn1/trunk/xdocs/refactor.xml
Modified:
incubator/directory/asn1/trunk/xdocs/index.xml
Modified: incubator/directory/asn1/trunk/xdocs/index.xml
Url: http://svn.apache.org/viewcvs/incubator/directory/asn1/trunk/xdocs/index.xml?view=diff&rev=125307&p1=incubator/directory/asn1/trunk/xdocs/index.xml&r1=125306&p2=incubator/directory/asn1/trunk/xdocs/index.xml&r2=125307
==============================================================================
--- incubator/directory/asn1/trunk/xdocs/index.xml (original)
+++ incubator/directory/asn1/trunk/xdocs/index.xml Sat Jan 15 13:00:52 2005
@@ -45,6 +45,14 @@
</td>
</tr>
</table>
+
+ <subsection name="Refactoring in 0.3 Branch">
+ <p>
+ For the next dev cycle we're radically refactoring the structure of
+ these modules. For more information you can take a look
+ <a href="./refactor.html">here</a>.
+ </p>
+ </subsection>
</section>
<section name="Motivation">
Added: incubator/directory/asn1/trunk/xdocs/refactor.xml
Url: http://svn.apache.org/viewcvs/incubator/directory/asn1/trunk/xdocs/refactor.xml?view=auto&rev=125307
==============================================================================
--- (empty file)
+++ incubator/directory/asn1/trunk/xdocs/refactor.xml Sat Jan 15 13:00:52 2005
@@ -0,0 +1,591 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<document>
+ <properties>
+ <author email="akarasulu@apache.org">Alex Karasulu</author>
+ <title>Refactoring the ASN.1 Runtime</title>
+ </properties>
+
+ <body>
+ <section name="Refactoring the ASN.1 Runtime">
+ <p>
+ The use of Snacc4J as the runtime ASN.1 BER codec for LDAP impossed an
+ IP issue for the new Directory Project under incubation. This resulted
+ in the creation of our own implementation, and hence the Apache ASN.1
+ Runtime library was created.
+ </p>
+
+ <p>
+ Before continuing any further it might be a good idea to read about
+ the existing architecture to understand the changes that are being
+ proposed.
+ </p>
+
+ <subsection name="High Level Goals and Changes">
+ <p>
+ The internal 0.2 release was the first successful attempt to produce a
+ replacement for Snacc4J. As of release 0.8 of ApacheDS it provides
+ BER encoders and decoders for LDAP requests and responses. The library
+ was designed with performance in mind. Some very good ideas were
+ introduced and really put to the test. However the library does have
+ performance problems. The designs to make this into a high performance
+ library were not totally followed through. Furthermore the code base
+ is very difficult to maintain needing some reorganization. We hope to
+ refactor the library so it is more efficient, and easier to maintain
+ while reducing the number of dependencies it has. In the process we
+ would like to introduce some new features and improvements which are
+ listed below:
+ </p>
+
+ <ul>
+ <li>
+ Better ByteBuffer utilization by splicing buffers instead of copying
+ them.
+ </li>
+
+ <li>
+ Repace current Tuple class with well defined Tuple interfaces:
+ specifically we need to remove TLV field processing from a Tuple
+ as well as tag cooking functionality. Tag cooking refers to the
+ application of transformations that turn tag bytes into a 4 byte
+ Java primitive integers. These functions need to be localized
+ within utility classes.
+ </li>
+
+ <li>
+ Some BER based protocols only use a subset of the encoding rules.
+ For example LDAP only uses definite length encodings for constructed
+ tuples. A reduced set of rules are much easier to code, maintain,
+ and often will perform significantly better than codecs designed for
+ the entire rule set. The key here however is to make sure that
+ the core of the codec can be replaced transparently without imposing
+ code changes.
+ </li>
+
+ <li>
+ The Tuples of primitives like binary values store the Tag, Length
+ and Value of the primitive TLV Tuple in memory. Sometimes primitive
+ values can be dangerously large for a server to encode or decode.
+ Primitive tuples could be blobs of large binaries like images. If
+ tuple values are larger than some application defined limit they
+ aught to be streamed to disk rather than kept in main memory.
+ Streaming to disk makes the server more efficient overall since it
+ can maintain a constant sized decoding footprint. However switching
+ to disk based storage will rightfully slow down the current operation
+ which involves a large primitive. This is a tradeoff that should
+ be configurable by API users and ultimately ApacheDS administrators.
+ </li>
+
+ <li>
+ Better logging and error handling for codecs with pershaps some
+ management interfaces to control the properties of codecs.
+ </li>
+
+ <li>
+ A single deployable artifact where the ber and codec jars are fused.
+ </li>
+
+ <li>
+ Make the code easier to maintain while improving its structure.
+ </li>
+ </ul>
+ </subsection>
+
+
+ </section>
+
+ <section name="Tuple Interface/Class Hierarchies">
+ <p>
+ Presently Tuples contain the functionality to decode and encode
+ fields. Tuples can even encode themselves to a buffer as BER or
+ DER. A Tuple is not a simple bean and that's all that it should be.
+ Hence one of our goals is to factor out this additional functionality.
+ </p>
+
+ <p>
+ A Tuple is a single class that acts more like a union of different
+ types rather than using inheritance to differentiate. There are
+ distinct types of tuples, constructed verses primitive for example.
+ Instead of using complex logic to differentiate what kind of Tuple an
+ instance is it is much better to differentiate the Tuple into
+ subclasses. Hence we propose a new interface and implementation
+ hierarchy for Tuples.
+ </p>
+
+ <p>
+ Let's start by proposing a minimal Tuple interface.
+ </p>
+
+<source>
+interface Tuple
+{
+ /**
+ * Gets the zero based index into a PDU where the first byte of this
+ * Tuple's tag resides.
+ *
+ * @return zero based index of Tag's first byte in the PDU
+ */
+ int getTagStartIndex();
+
+ /**
+ * Gets this TLV Tuple's Tag (T) as a type safe enumeration.
+ *
+ * @return type safe enumeration for the Tag
+ */
+ TagEnum getTag();
+
+ /**
+ * Gets whether or not this Tuple is constructed.
+ *
+ * @return true if the Tag is constructed false if it is primitive.
+ */
+ boolean isConstructed();
+}
+</source>
+
+ <p>
+ These interfaces give the minimum information needed for a Tuple
+ that is not specific to another specialized type of Tuple. Meaning
+ all Tuples share these methods. We can also go a step further and
+ implement an AbstractTuple where protected members are used to
+ implement these methods. Note that isConstructed() will probably be
+ left abstract so subclasses can just return true or false. For
+ brevity this code is not shown but other classes in the section below
+ will extend from AbstractTuple.
+ </p>
+
+ <subsection name="Primitive Vs. Constructed Tuples">
+ <p>
+ We need to go a step further and start differentiating between Tuples
+ that are primitive and those that are constructed. In this step we
+ introduce two new abstract classes PrimitiveTuple and
+ ConstructedTuple.
+ </p>
+
+ <p>
+ These two classes will be described below but one might ask why both
+ are still abstract. This is because we need to differentiate further
+ for buffered verses streamed Tuples in the case of primitive Tuples.
+ For constructed Tuples we need to differentiate between definate
+ length verses indefinate length Tuples. With our approach, only the
+ leaf nodes of the inheritance hierarchy will be concrete. Below is
+ the definition for the PrimitiveTuple.
+ </p>
+
+<source>
+public abstract class PrimitiveTuple extends AbstractTuple
+{
+ /** the number of bytes used to compose the Tuple's length field */
+ protected int lengthFieldSz = 0;
+ /** the number of bytes used to compose the Tuple's value field */
+ protected int valueFieldSz = 0;
+
+ ...
+
+ public final boolean isConstructed()
+ {
+ return false;
+ }
+
+ /**
+ * Gets whether or not this Tuple's value is buffered in memory or
+ * streamed to disk.
+ *
+ * @return true if the value is buffered in memory, false if it is streamed
+ * to disk
+ */
+ public abstract boolean isBuffered();
+
+ /**
+ * Gets the number of bytes in the length (L) field of this TLV Tuple.
+ *
+ * @return number of bytes for the length
+ */
+ public final int getLengthFieldSize()
+ {
+ return lengthFieldSz;
+ }
+
+ /**
+ * Gets the number of bytes in the value (V) field of this TLV Tuple.
+ *
+ * @return number of bytes for the value
+ */
+ public final int getValueFieldSize();
+ {
+ return valueFieldSz;
+ }
+
+ ...
+}
+</source>
+ <p>
+ This abstract class adds two new concrete methods for tracking the
+ size of the length and value fields. Constructed Tuples may not
+ necessarily have a length value associated with them if they are
+ of the indeterminate form. Furthermore the value of constructed
+ Tuples are the nested child Tuples subordinate to them. So there
+ is no need to track the value prematurely now for anything other
+ than primitive Tuples.
+ </p>
+
+ <p>
+ Note that the isBuffered() method is implemented as final and always
+ returns false for this lineage of Tuples. A final modifier on the
+ method makes sense and sometimes helps the compiler inline this
+ method so we don't always pay a price for using it in addition to
+ subclassing. A new abstract method isBuffered() is introduced which
+ is discussed in detail within the Buffered Vs. Streamed section.
+ </p>
+
+ <p>
+ Now let's take a look at the ConstructedTuple abstract class.
+ </p>
+
+<source>
+public abstract class ConstructedTuple extends AbstractTuple
+{
+ public final boolean isConstructed()
+ {
+ return true;
+ }
+
+ /**
+ * Gets whether or not the length of this constructed Tuple is of the
+ * definate form or of the indefinate length form.
+ *
+ * @return true if the length is definate, false if the length is of the
+ * indefinate form
+ */
+ public abstract boolean isLengthDefinate();
+}
+</source>
+
+ <p>
+ ConstructedTuple implements the <code>isConstructed()</code> method
+ as final since it will always return false for this lineage of
+ Tuples. Also a new abstract method isLengthDefinate() is introduced
+ to see if the Tuple uses the indefinate length form or not.
+ </p>
+ </subsection>
+
+ <subsection name="Definate Vs. Indefinate Length">
+ <p>
+ The ConstructedTuple can be further differentiated into two
+ subclasses to represent definate and indefinate length constructed
+ TLV Tuples. The indefinate form does not have a length value
+ associated with it where as the definate lenght form does. Let's
+ explore the concrete IndefinateLegthTuple definition.
+ </p>
+
+<source>
+public class IndefinateLength extends ConstructedTuple
+{
+ public final boolean isLengthDefinate()
+ {
+ return false;
+ }
+}
+</source>
+
+ <p>
+ Yep this is pretty simple. There is very little to track for this
+ Tuple since most of the tracking is handled by its decendent Tuples.
+ The class also is concrete. What about the DefinateLength
+ implementation ...
+ </p>
+
+<source>
+public class DefinateLength extends ConstructedTuple
+{
+ /** the number of bytes used to compose the Tuple's length field */
+ protected int lengthFieldSz = 0;
+ /** the number of bytes used to compose the Tuple's value field */
+ protected int valueFieldSz = 0;
+
+ ...
+
+ public final boolean isLengthDefinate()
+ {
+ return true;
+ }
+
+ /**
+ * Gets the number of bytes in the length (L) field of this TLV Tuple.
+ *
+ * @return number of bytes for the length
+ */
+ public final int getLengthFieldSize()
+ {
+ return lengthFieldSz;
+ }
+
+ /**
+ * Gets the number of bytes in the value (V) field of this TLV Tuple.
+ *
+ * @return number of bytes for the value
+ */
+ public final int getValueFieldSize();
+ {
+ return valueFieldSz;
+ }
+}
+</source>
+ <p>
+ Now this introduces two new concrete methods for getting the length
+ of the length field and the length of the value field. A determinate
+ length TLV has a valid value within the Length (L) field. The value
+ of the length field is the length of the value field. Hence the
+ reason why we include both these concrete methods.
+ </p>
+ </subsection>
+
+ <subsection name="Buffered Vs. Streamed PrimitiveTuples">
+ <p>
+ As we mentioned before, there are two kinds of primitive Tuples.
+ Those that keep there value in a buffer within the TLV Tuple object,
+ in which case it is buffered within memory, and those that stream
+ the value to disk and store a referral to the value on disk. These
+ two beasts are so different it makes sense to differentiate between
+ them using subclasses. Let's take a look at a BufferedTuple which
+ is the simplest one.
+ </p>
+
+<source>
+public class BufferedTuple extends PrimitiveTuple
+{
+ /** contains ByteBuffers which contain parts of the value */
+ private final ArrayList value = new ArrayList();
+ /** pre-fab final unmodifiable wrapper around our modifiable list */
+ private final List unmodifiable = Collections.unmodifiableList( value );
+
+ public final boolean isBuffered()
+ {
+ return true;
+ }
+
+ /**
+ * Gets the value of this Tuple as a List of ByteBuffers.
+ *
+ * @return a list of ByteBuffers containing parts of the value
+ */
+ public final List getValue()
+ {
+ return unmodifiable;
+ }
+}
+</source>
+
+ <p>
+ The implementation introduces a final <code>getValue()</code> method
+ which returns an unmodifiable wrapper around a modifiable list of
+ ByteBuffers. The <code>isBuffered()</code> method is made final and
+ implemented to return true all the time. This is easy so let's now
+ take a look at the StreamedTuple implementation.
+ </p>
+
+<source>
+public abstract class StreamedTuple extends PrimitiveTuple
+{
+ public final boolean isBuffered()
+ {
+ return false;
+ }
+
+ // might experiment with a getURL to represent the source of
+ // the data stream - we need to discuss this on the list
+
+ /**
+ * Depending on the backing store used for accessing streamed data there
+ * may need to be multiple subclasses that implement this method.
+ *
+ * @return an InputStream that can be used to read this Tuple's streamed
+ * value data
+ */
+ public abstract InputStream getValueStream();
+
+ // another question is whether or not to offer a readable Channel instead
+ // of an InputStream? This is another topic for discussion.
+}
+</source>
+
+ <p>
+ At this point we know that there could be multiple ways to implement
+ this kind of StreamedTuple. Notice though the value is accessed
+ through a stream provided by the Tuple. This way the large value
+ stored on disk need not all be kept in memory at one time during the
+ decode or encode process.
+ </p>
+
+ </subsection>
+
+ <p>
+ Some code will be removed from the Tuple class today during the
+ refactoring and kept in a TupleUtils class. Functionality like
+ the encoding, decoding of Tuple fields and tag cooking can be
+ offloaded to this class.
+ </p>
+ </section>
+
+ <section name="notes">
+ <p>
+ By far the largest part of the refactoring effort is in introducing
+ this new hierarchy and introducing some patterns that improve the
+ maintainability of the code like the State pattern. Other minor
+ details for this dev cycle are discussed below.
+ </p>
+
+ <subsection name="Termination Tuples">
+ <p>
+ A lot of effort is made to track the position of a Tuple within a
+ PDU. This is why we have methods like getTagStartIndex(). We want
+ to know where the first byte of a Tuple's tag is within a PDU. This
+ positional accounting enables better error reporting when problems
+ result. They also allow us to detect when we start and stop
+ processing a PDU.
+ </p>
+
+ <p>
+ The minimum amount of information needed to track the position of a
+ Tuple within a PDU or the start and stop points of a PDU is to have
+ the Tuple's tag start index, and the lengths of fields within the
+ Tuple.
+ </p>
+
+ <p>
+ In a decoder for example we know that we've processed the last
+ topmost Tuple of a PDU when we get a Tuple whose <code>
+ getTagStartIndex()</code> returns 0. <b>WARNING</b>: AbstractTuple
+ should default the value the start tag index to -1 so it cannot
+ be interpretted as a terminator.
+ </p>
+ </subsection>
+
+ <subsection name="New Coherent Replacement for Stateful Codec API">
+ <p>
+ There have been many complaints about the codec API being too
+ generic or the callback mechanism being somewhat unintuitive.
+ Perhaps we can work on more specific interfaces which incorporate
+ the concepts of producer and consumer. Plus let's see if we can
+ make these interfaces specific so we don't have ugly codes and casts
+ all over the place.
+ </p>
+
+ <p>
+ Also in the end we want to do away with this codec API which was
+ originally intended to fuse back into commons. I've abandoned this
+ idea because it is too difficult to make all parties happy. The
+ best thing to do is create our own interface that fit well and
+ enable them to be wrapped for other APIs. Hence going towards custom
+ codec API's is not an issue. The old codec stuff can be pushed into
+ the protocol framework API.
+ </p>
+
+ <p>
+ Furthermore at the end of the day we want there to be a single runtime
+ jar without any dependencies for the ASN.1 stuff. That means no more
+ codec API as it is with jar today within the ASN.1 project.
+ </p>
+
+ <p>
+ Some new producer consumer interface ideas are listed below:
+ </p>
+
+ <ul>
+ <li>
+ BufferConsumer: consumes ByteBuffers. Something like <code>void
+ consume(ByteBuffer bb)</code> comes to mind. Perhaps even with
+ overloads to take a list or array of BBs.
+ </li>
+
+ <li>
+ TupleProducer: generates Tuples (often is a BufferConsumer). Some
+ thing like <code>void setConsumer(TupleConsumer consumer)</code>
+ comes to mind.
+ </li>
+
+ <li>
+ TupleConsumer: consumes Tuples generated by a TupleProducer.
+ Something like <code>void consume(Tuple tlv)</code> comes to mind.
+ </li>
+
+ <li>
+ MessageProducer: produces populated message stubs
+ </li>
+ </ul>
+ </subsection>
+
+ <subsection name="Possibly Merging TupleNode and Tuple">
+ <p>
+ Right now to build Tuple trees we use yet another class to wrap
+ Tuples called TupleNodes. This kept the contents of the Tuple
+ class less conjested. The Tuple class will no longer exist and the
+ conjestion issues is no longer valid. The question now is, is it
+ worth keeping parent child methods in TupleNode when creating trees
+ while paying for extra object creation?
+ </p>
+
+ <p>
+ Note that the TupleNode methods are not required on Tuple to process
+ a byte stream of encoded TLV data in a sax-like fashion. These
+ methods are only required for higher level operations like visitations
+ from visitors during the encoding process. The question really is
+ whether we will make Tuple impure to save a little time so we don't
+ have to create TupleNode objects to wrap Tuples and model the
+ hierarchy? This is something that needs to be discussed.
+ </p>
+
+ <p>
+ Contrary to the purist approach of keeping Tuple and TupleNode
+ separate one can merge the two. A codec need not honor these methods
+ by building the tree. Meaning these tree node (TupleNode) methods
+ may simply return null. If these methods are honored then it is the
+ intent of the codec to build a tree. If the tree is built the
+ processing is more like DOM and if not then it is more like SAX. We
+ should not tax the DOM like processing use case by forcing the need
+ to create extra wrappers, while accomodating the purist view.
+ </p>
+ </subsection>
+
+ <subsection name="Removing the Digester Concept">
+ <p>
+ I don't know what I was thinking when I devised this rule based
+ approach similar to the Digester in commons. This was a big mistake
+ and IMO one of the reasons why we have performance issues. This
+ datastructure can be removed entirely from upper layers that depend
+ on it.
+ </p>
+
+ <p>
+ Granted this means we are going to have to weave once again our own
+ classes for handling LDAP specific PDU's however I think this will be
+ easy to do. I will essentially rewrite the LDAP provider based on
+ our runtime to hardcode the switching rather than using this rule
+ based triggering approach. The new approach is also going to
+ simplify the code significantly making it more maintainable.
+ Hopefully these changes will also speed up the code since less
+ objects will need to be created every time a decoder is instantiated.
+ </p>
+ </subsection>
+
+ <subsection name="It's Time For DER and CER">
+ <p>
+ We need to find a way to make the rules used while decoding and
+ encoding Tuples plugable. This way we can change the rules to
+ encode as generic BER, reduced BER (for increases in performance
+ in the case of specific protocol needs). DER likewise is a reduced
+ set of BER with restrictions on the encoding and range of values
+ that can be interpreted from primitive values. If the plugability
+ is there the runtime is a flexible TLV Tuple codec that can change
+ the rules use to handle the substrate.
+ </p>
+
+ <p>
+ We could easily have BerDecoder, CerDecoder and even protocol specific
+ decoders with those BER rules used by a protocol such as
+ LdapBerDecoder for those BER decoding rules that only apply to LDAP.
+ </p>
+ </subsection>
+
+ </section>
+ </body>
+</document>