You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@directory.apache.org by ak...@apache.org on 2005/01/15 22:00:52 UTC
svn commit: r125307 - /incubator/directory/asn1/trunk/xdocs/index.xml /incubator/directory/asn1/trunk/xdocs/refactor.xml

Author: akarasulu
Date: Sat Jan 15 13:00:52 2005
New Revision: 125307

URL: http://svn.apache.org/viewcvs?view=rev&rev=125307
Log:
updates to site with docs on how we are going to refactor asn1 to make it better and faster
Added:
   incubator/directory/asn1/trunk/xdocs/refactor.xml
Modified:
   incubator/directory/asn1/trunk/xdocs/index.xml

Modified: incubator/directory/asn1/trunk/xdocs/index.xml
Url: http://svn.apache.org/viewcvs/incubator/directory/asn1/trunk/xdocs/index.xml?view=diff&rev=125307&p1=incubator/directory/asn1/trunk/xdocs/index.xml&r1=125306&p2=incubator/directory/asn1/trunk/xdocs/index.xml&r2=125307
==============================================================================
--- incubator/directory/asn1/trunk/xdocs/index.xml	(original)
+++ incubator/directory/asn1/trunk/xdocs/index.xml	Sat Jan 15 13:00:52 2005
@@ -45,6 +45,14 @@
           </td>
         </tr>
       </table>
+ 
+      <subsection name="Refactoring in 0.3 Branch">
+        <p>
+          For the next dev cycle we're radically refactoring the structure of
+          these modules.  For more information you can take a look 
+          <a href="./refactor.html">here</a>.
+        </p>
+      </subsection>
     </section>
 
     <section name="Motivation">

Added: incubator/directory/asn1/trunk/xdocs/refactor.xml
Url: http://svn.apache.org/viewcvs/incubator/directory/asn1/trunk/xdocs/refactor.xml?view=auto&rev=125307
==============================================================================
--- (empty file)
+++ incubator/directory/asn1/trunk/xdocs/refactor.xml	Sat Jan 15 13:00:52 2005
@@ -0,0 +1,591 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<document>
+  <properties>
+    <author email="akarasulu@apache.org">Alex Karasulu</author>
+    <title>Refactoring the ASN.1 Runtime</title>
+  </properties> 
+
+  <body>
+    <section name="Refactoring the ASN.1 Runtime">
+      <p>
+        The use of Snacc4J as the runtime ASN.1 BER codec for LDAP impossed an 
+        IP issue for the new Directory Project under incubation.  This resulted
+        in the creation of our own implementation, and hence the Apache ASN.1 
+        Runtime library was created.  
+      </p>
+
+      <p>
+        Before continuing any further it might be a good idea to read about 
+        the existing architecture to understand the changes that are being 
+        proposed.
+      </p>
+
+      <subsection name="High Level Goals and Changes">
+      <p>
+        The internal 0.2 release was the first successful attempt to produce a 
+        replacement for Snacc4J.  As of release 0.8 of ApacheDS it provides 
+        BER encoders and decoders for LDAP requests and responses.  The library
+        was designed with performance in mind.  Some very good ideas were 
+        introduced and really put to the test.  However the library does have 
+        performance problems.  The designs to make this into a high performance
+        library were not totally followed through.  Furthermore the code base
+        is very difficult to maintain needing some reorganization.  We hope to 
+        refactor the library so it is more efficient, and easier to maintain 
+        while reducing the number of dependencies it has.  In the process we 
+        would like to introduce some new features and improvements which are 
+        listed below:
+      </p>
+      
+      <ul>
+        <li>
+	  Better ByteBuffer utilization by splicing buffers instead of copying 
+          them.
+        </li>
+
+        <li>
+	  Repace current Tuple class with well defined Tuple interfaces: 
+          specifically we need to remove TLV field processing from a Tuple
+          as well as tag cooking functionality.  Tag cooking refers to the 
+          application of transformations that turn tag bytes into a 4 byte 
+          Java primitive integers.  These functions need to be localized 
+          within utility classes.
+        </li>
+
+        <li>
+	  Some BER based protocols only use a subset of the encoding rules.  
+          For example LDAP only uses definite length encodings for constructed 
+          tuples.  A reduced set of rules are much easier to code, maintain, 
+          and often will perform significantly better than codecs designed for 
+          the entire rule set.  The key here however is to make sure that
+          the core of the codec can be replaced transparently without imposing
+          code changes. 
+        </li>
+
+        <li>
+	  The Tuples of primitives like binary values store the Tag, Length 
+          and Value of the primitive TLV Tuple in memory.  Sometimes primitive 
+          values can be dangerously large for a server to encode or decode.  
+          Primitive tuples could be blobs of large binaries like images.  If 
+          tuple values are larger than some application defined limit they 
+          aught to be streamed to disk rather than kept in main memory.  
+          Streaming to disk makes the server more efficient overall since it 
+          can maintain a constant sized decoding footprint.  However switching
+          to disk based storage will rightfully slow down the current operation
+          which involves a large primitive.  This is a tradeoff that should
+          be configurable by API users and ultimately ApacheDS administrators.
+        </li>
+
+        <li>
+	  Better logging and error handling for codecs with pershaps some
+          management interfaces to control the properties of codecs.
+        </li>
+
+        <li>
+	  A single deployable artifact where the ber and codec jars are fused.  
+        </li>
+
+        <li>
+	  Make the code easier to maintain while improving its structure.
+        </li>
+        </ul>
+      </subsection>
+
+      
+    </section>
+
+    <section name="Tuple Interface/Class Hierarchies">
+      <p>
+        Presently Tuples contain the functionality to decode and encode 
+        fields.  Tuples can even encode themselves to a buffer as BER or
+        DER.  A Tuple is not a simple bean and that's all that it should be.  
+        Hence one of our goals is to factor out this additional functionality.
+      </p>
+
+      <p>
+        A Tuple is a single class that acts more like a union of different 
+        types rather than using inheritance to differentiate.  There are 
+        distinct types of tuples, constructed verses primitive for example.  
+        Instead of using complex logic to differentiate what kind of Tuple an 
+        instance is it is much better to differentiate the Tuple into 
+        subclasses.  Hence we propose a new interface and implementation 
+        hierarchy for Tuples.
+      </p>
+
+      <p>
+        Let's start by proposing a minimal Tuple interface.
+      </p>
+
+<source>
+interface Tuple
+{
+    /**
+     * Gets the zero based index into a PDU where the first byte of this
+     * Tuple's tag resides.
+     *
+     * @return zero based index of Tag's first byte in the PDU
+     */ 
+    int getTagStartIndex();
+
+    /**
+     * Gets this TLV Tuple's Tag (T) as a type safe enumeration.
+     *
+     * @return type safe enumeration for the Tag
+     */
+    TagEnum getTag();
+
+    /**
+     * Gets whether or not this Tuple is constructed.
+     *
+     * @return true if the Tag is constructed false if it is primitive.
+     */
+    boolean isConstructed();
+}
+</source>
+
+      <p>
+        These interfaces give the minimum information needed for a Tuple
+        that is not specific to another specialized type of Tuple.  Meaning
+        all Tuples share these methods.  We can also go a step further and
+        implement an AbstractTuple where protected members are used to 
+        implement these methods.  Note that isConstructed() will probably be
+        left abstract so subclasses can just return true or false.  For 
+        brevity this code is not shown but other classes in the section below
+        will extend from AbstractTuple.
+      </p>
+
+      <subsection name="Primitive Vs. Constructed Tuples">
+        <p>
+          We need to go a step further and start differentiating between Tuples
+          that are primitive and those that are constructed.  In this step we
+          introduce two new abstract classes PrimitiveTuple and 
+          ConstructedTuple.
+        </p>
+
+        <p>
+          These two classes will be described below but one might ask why both
+          are still abstract.  This is because we need to differentiate further
+          for buffered verses streamed Tuples in the case of primitive Tuples.
+          For constructed Tuples we need to differentiate between definate
+          length verses indefinate length Tuples.  With our approach, only the 
+          leaf nodes of the inheritance hierarchy will be concrete.  Below is
+          the definition for the PrimitiveTuple.
+        </p>
+
+<source>
+public abstract class PrimitiveTuple extends AbstractTuple
+{
+    /** the number of bytes used to compose the Tuple's length field */
+    protected int lengthFieldSz = 0;
+    /** the number of bytes used to compose the Tuple's value field */
+    protected int valueFieldSz = 0;
+
+    ...
+
+    public final boolean isConstructed()
+    {
+        return false;
+    }
+
+    /**
+     * Gets whether or not this Tuple's value is buffered in memory or 
+     * streamed to disk.
+     *
+     * @return true if the value is buffered in memory, false if it is streamed
+     * to disk
+     */
+    public abstract boolean isBuffered();
+
+    /**
+     * Gets the number of bytes in the length (L) field of this TLV Tuple.
+     *
+     * @return number of bytes for the length
+     */
+    public final int getLengthFieldSize()
+    {
+        return lengthFieldSz;
+    }
+
+    /**
+     * Gets the number of bytes in the value (V) field of this TLV Tuple.
+     *
+     * @return number of bytes for the value
+     */
+    public final int getValueFieldSize();
+    {
+        return valueFieldSz;
+    }
+
+    ... 
+}
+</source>
+        <p>
+          This abstract class adds two new concrete methods for tracking the
+          size of the length and value fields.  Constructed Tuples may not
+          necessarily have a length value associated with them if they are
+          of the indeterminate form.  Furthermore the value of constructed
+          Tuples are the nested child Tuples subordinate to them.  So there
+          is no need to track the value prematurely now for anything other 
+          than primitive Tuples.
+        </p>
+
+        <p>
+          Note that the isBuffered() method is implemented as final and always 
+          returns false for this lineage of Tuples.  A final modifier on the
+          method makes sense and sometimes helps the compiler inline this 
+          method so we don't always pay a price for using it in addition to 
+          subclassing.  A new abstract method isBuffered() is introduced which 
+          is discussed in detail within the Buffered Vs. Streamed section.
+        </p>
+
+        <p> 
+          Now let's take a look at the ConstructedTuple abstract class.
+        </p>
+
+<source>
+public abstract class ConstructedTuple extends AbstractTuple
+{
+    public final boolean isConstructed()
+    {
+        return true;
+    }
+
+    /**
+     * Gets whether or not the length of this constructed Tuple is of the 
+     * definate form or of the indefinate length form.
+     *
+     * @return true if the length is definate, false if the length is of the
+     * indefinate form
+     */
+    public abstract boolean isLengthDefinate();
+}
+</source>
+
+        <p>
+          ConstructedTuple implements the <code>isConstructed()</code> method 
+          as final since it will always return false for this lineage of 
+          Tuples.  Also a new abstract method isLengthDefinate() is introduced 
+          to see if the Tuple uses the indefinate length form or not.  
+        </p>
+      </subsection>
+
+      <subsection name="Definate Vs. Indefinate Length">
+        <p>
+          The ConstructedTuple can be further differentiated into two 
+          subclasses to represent definate and indefinate length constructed 
+          TLV Tuples.  The indefinate form does not have a length value 
+          associated with it where as the definate lenght form does.  Let's
+          explore the concrete IndefinateLegthTuple definition.
+        </p>
+
+<source>
+public class IndefinateLength extends ConstructedTuple
+{
+    public final boolean isLengthDefinate()
+    {
+        return false;
+    }
+}
+</source>
+
+        <p>
+          Yep this is pretty simple.  There is very little to track for this
+          Tuple since most of the tracking is handled by its decendent Tuples.
+          The class also is concrete.  What about the DefinateLength 
+          implementation ...
+        </p>
+
+<source>
+public class DefinateLength extends ConstructedTuple
+{
+    /** the number of bytes used to compose the Tuple's length field */
+    protected int lengthFieldSz = 0;
+    /** the number of bytes used to compose the Tuple's value field */
+    protected int valueFieldSz = 0;
+
+    ...
+
+    public final boolean isLengthDefinate()
+    {
+        return true;
+    }
+
+    /**
+     * Gets the number of bytes in the length (L) field of this TLV Tuple.
+     *
+     * @return number of bytes for the length
+     */
+    public final int getLengthFieldSize()
+    {
+        return lengthFieldSz;
+    }
+
+    /**
+     * Gets the number of bytes in the value (V) field of this TLV Tuple.
+     *
+     * @return number of bytes for the value
+     */
+    public final int getValueFieldSize();
+    {
+        return valueFieldSz;
+    }
+}
+</source>
+        <p>
+          Now this introduces two new concrete methods for getting the length
+          of the length field and the length of the value field.  A determinate
+          length TLV has a valid value within the Length (L) field.  The value
+          of the length field is the length of the value field.  Hence the 
+          reason why we include both these concrete methods.
+        </p>
+      </subsection>
+
+      <subsection name="Buffered Vs. Streamed PrimitiveTuples">
+        <p>
+          As we mentioned before, there are two kinds of primitive Tuples.  
+          Those that keep there value in a buffer within the TLV Tuple object,
+          in which case it is buffered within memory, and those that stream
+          the value to disk and store a referral to the value on disk.  These
+          two beasts are so different it makes sense to differentiate between
+          them using subclasses.  Let's take a look at a BufferedTuple which
+          is the simplest one.
+        </p>
+
+<source>
+public class BufferedTuple extends PrimitiveTuple
+{
+    /** contains ByteBuffers which contain parts of the value */
+    private final ArrayList value = new ArrayList();
+    /** pre-fab final unmodifiable wrapper around our modifiable list */
+    private final List unmodifiable = Collections.unmodifiableList( value );
+
+    public final boolean isBuffered()
+    {
+        return true;
+    }
+
+    /**
+     * Gets the value of this Tuple as a List of ByteBuffers.
+     *
+     * @return a list of ByteBuffers containing parts of the value
+     */
+    public final List getValue()
+    {
+        return unmodifiable;
+    }
+}
+</source>
+
+        <p>
+          The implementation introduces a final <code>getValue()</code> method
+          which returns an unmodifiable wrapper around a modifiable list of 
+          ByteBuffers.  The <code>isBuffered()</code> method is made final and
+          implemented to return true all the time.  This is easy so let's now
+          take a look at the StreamedTuple implementation.
+        </p>
+
+<source>
+public abstract class StreamedTuple extends PrimitiveTuple
+{
+    public final boolean isBuffered()
+    {
+        return false;
+    }
+
+    // might experiment with a getURL to represent the source of 
+    // the data stream - we need to discuss this on the list
+
+    /**
+     * Depending on the backing store used for accessing streamed data there
+     * may need to be multiple subclasses that implement this method.
+     *
+     * @return an InputStream that can be used to read this Tuple's streamed 
+     * value data
+     */
+    public abstract InputStream getValueStream();   
+
+    // another question is whether or not to offer a readable Channel instead
+    // of an InputStream?  This is another topic for discussion.
+}
+</source>        
+        
+        <p>
+          At this point we know that there could be multiple ways to implement
+          this kind of StreamedTuple.  Notice though the value is accessed 
+          through a stream provided by the Tuple.  This way the large value 
+          stored on disk need not all be kept in memory at one time during the
+          decode or encode process.
+        </p>
+
+      </subsection>
+    
+      <p>
+        Some code will be removed from the Tuple class today during the 
+        refactoring and kept in a TupleUtils class.  Functionality like
+        the encoding, decoding of Tuple fields and tag cooking can be
+        offloaded to this class.
+      </p>
+    </section>
+
+    <section name="notes">
+      <p>
+        By far the largest part of the refactoring effort is in introducing 
+        this new hierarchy and introducing some patterns that improve the
+        maintainability of the code like the State pattern.  Other minor 
+        details for this dev cycle are discussed below.
+      </p>
+
+      <subsection name="Termination Tuples">
+        <p>
+          A lot of effort is made to track the position of a Tuple within a
+          PDU.  This is why we have methods like getTagStartIndex().  We want
+          to know where the first byte of a Tuple's tag is within a PDU.  This
+          positional accounting enables better error reporting when problems
+          result.  They also allow us to detect when we start and stop 
+          processing a PDU.  
+        </p>
+
+        <p>
+          The minimum amount of information needed to track the position of a
+          Tuple within a PDU or the start and stop points of a PDU is to have 
+          the Tuple's tag start index, and the lengths of fields within the 
+          Tuple.
+        </p>
+
+        <p>
+          In a decoder for example we know that we've processed the last 
+          topmost Tuple of a PDU when we get a Tuple whose <code>
+          getTagStartIndex()</code> returns 0.  <b>WARNING</b>: AbstractTuple
+          should default the value the start tag index to -1 so it cannot 
+          be interpretted as a terminator.
+        </p>
+      </subsection>
+
+      <subsection name="New Coherent Replacement for Stateful Codec API">
+        <p>
+          There have been many complaints about the codec API being too 
+          generic or the callback mechanism being somewhat unintuitive.
+          Perhaps we can work on more specific interfaces which incorporate
+          the concepts of producer and consumer.  Plus let's see if we can
+          make these interfaces specific so we don't have ugly codes and casts
+          all over the place.
+        </p>
+
+        <p>
+          Also in the end we want to do away with this codec API which was
+          originally intended to fuse back into commons.  I've abandoned this
+          idea because it is too difficult to make all parties happy.  The 
+          best thing to do is create our own interface that fit well and 
+          enable them to be wrapped for other APIs.  Hence going towards custom
+          codec API's is not an issue.  The old codec stuff can be pushed into
+          the protocol framework API.
+        </p>
+
+        <p>
+          Furthermore at the end of the day we want there to be a single runtime
+          jar without any dependencies for the ASN.1 stuff.  That means no more
+          codec API as it is with jar today within the ASN.1 project. 
+        </p>
+
+        <p>
+          Some new producer consumer interface ideas are listed below:
+        </p>
+
+        <ul>
+          <li>
+            BufferConsumer: consumes ByteBuffers. Something like <code>void
+            consume(ByteBuffer bb)</code> comes to mind.  Perhaps even with 
+            overloads to take a list or array of BBs.
+          </li>
+
+          <li>
+            TupleProducer: generates Tuples (often is a BufferConsumer).  Some
+            thing like <code>void setConsumer(TupleConsumer consumer)</code>
+            comes to mind.
+          </li>
+
+          <li>
+            TupleConsumer: consumes Tuples generated by a TupleProducer.
+            Something like <code>void consume(Tuple tlv)</code> comes to mind.
+          </li>
+
+          <li>
+            MessageProducer: produces populated message stubs
+          </li>
+        </ul>
+      </subsection>
+
+      <subsection name="Possibly Merging TupleNode and Tuple">
+        <p>
+          Right now to build Tuple trees we use yet another class to wrap 
+          Tuples called TupleNodes.  This kept the contents of the Tuple
+          class less conjested.  The Tuple class will no longer exist and the
+          conjestion issues is no longer valid.  The question now is, is it
+          worth keeping parent child methods in TupleNode when creating trees
+          while paying for extra object creation?
+        </p>
+
+        <p>
+          Note that the TupleNode methods are not required on Tuple to process
+          a byte stream of encoded TLV data in a sax-like fashion.  These 
+          methods are only required for higher level operations like visitations
+          from visitors during the encoding process.  The question really is 
+          whether we will make Tuple impure to save a little time so we don't 
+          have to create TupleNode objects to wrap Tuples and model the 
+          hierarchy?  This is something that needs to be discussed.
+        </p>
+
+        <p>
+          Contrary to the purist approach of keeping Tuple and TupleNode 
+          separate one can merge the two.  A codec need not honor these methods
+          by building the tree.  Meaning these tree node (TupleNode) methods
+          may simply return null.  If these methods are honored then it is the
+          intent of the codec to build a tree.  If the tree is built the 
+          processing is more like DOM and if not then it is more like SAX.  We
+          should not tax the DOM like processing use case by forcing the need 
+          to create extra wrappers, while accomodating the purist view.
+        </p>
+      </subsection>
+
+      <subsection name="Removing the Digester Concept">
+        <p>
+          I don't know what I was thinking when I devised this rule based
+          approach similar to the Digester in commons.  This was a big mistake
+          and IMO one of the reasons why we have performance issues.  This 
+          datastructure can be removed entirely from upper layers that depend
+          on it.
+        </p>
+
+        <p>
+          Granted this means we are going to have to weave once again our own
+          classes for handling LDAP specific PDU's however I think this will be
+          easy to do.  I will essentially rewrite the LDAP provider based on
+          our runtime to hardcode the switching rather than using this rule 
+          based triggering approach.  The new approach is also going to 
+          simplify the code significantly making it more maintainable.  
+          Hopefully these changes will also speed up the code since less
+          objects will need to be created every time a decoder is instantiated.
+        </p>
+      </subsection>
+
+      <subsection name="It's Time For DER and CER">
+        <p>
+          We need to find a way to make the rules used while decoding and 
+          encoding Tuples plugable.  This way we can change the rules to 
+          encode as generic BER, reduced BER (for increases in performance 
+          in the case of specific protocol needs).  DER likewise is a reduced
+          set of BER with restrictions on the encoding and range of values 
+          that can be interpreted from primitive values.  If the plugability
+          is there the runtime is a flexible TLV Tuple codec that can change
+          the rules use to handle the substrate.
+        </p>
+
+        <p>
+          We could easily have BerDecoder, CerDecoder and even protocol specific
+          decoders with those BER rules used by a protocol such as 
+          LdapBerDecoder for those BER decoding rules that only apply to LDAP.
+        </p>
+      </subsection>
+
+    </section>
+  </body>
+</document>