You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@harmony.apache.org by te...@apache.org on 2006/04/07 12:17:27 UTC
svn commit: r392237 [4/4] - in
/incubator/harmony/enhanced/classlib/trunk/modules:
beans/src/test/java/org/apache/harmony/tests/java/beans/
crypto/src/main/java/javax/crypto/ crypto/src/test/java/javax/crypto/
regex-beans-math/doc/ security/src/test/ja...
Modified: incubator/harmony/enhanced/classlib/trunk/modules/regex-beans-math/doc/Regexp.htm
URL: http://svn.apache.org/viewcvs/incubator/harmony/enhanced/classlib/trunk/modules/regex-beans-math/doc/Regexp.htm?rev=392237&r1=392236&r2=392237&view=diff
==============================================================================
--- incubator/harmony/enhanced/classlib/trunk/modules/regex-beans-math/doc/Regexp.htm (original)
+++ incubator/harmony/enhanced/classlib/trunk/modules/regex-beans-math/doc/Regexp.htm Fri Apr 7 03:17:21 2006
@@ -1,333 +1,333 @@
-<!DOCTYPE html public "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
-<html>
-<head>
-<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
-<LINK rel="Stylesheet" type="text/css" media="all" href="drl.css">
-<title>Design of the regex processing framework</title>
-</head>
-<body>
-
- <h1 align=center><a name="Top"></a>Regex Processing Framework</h1>
- <p class="TOCHeading"><A href="#Revision_History">Revision History</A></p>
- <p class="TOCHeading"><A href="#Disclaimer">Disclaimer</A></p>
- <p class="TOCHeading"><A href="#About_this_document">About this Document</A>
- </p>
- <p class="TOC"><A href="#Purpose">Purpose </A> </p>
- <p class="TOC"><A href="#Intended_Audience">Intended Audience </A> </p>
- <p class="TOC"><A href="#Documentation_Conventions">Documentation Conventions</A></p>
- <p class="TOCHeading"><a href="#Introduction">Introduction to Pattern Matching</a></p>
- <p class="TOCHeading"><a href="#AbstractSetIntro">AbstractSet Interface</a></p>
- <p class="TOC"><a href="#Characteristics">Characteristics</a></p>
- <p class="TOC"><a href="#MethodsUsed">Methods Used</a></p>
- <p class="TOC"><a href="#ClassHierarchy">Class Hierarchy</a></p>
- <p class="TOC"><a href="#Optimizations">Optimizations</a></p>
-
-<p class="TOCHeading"><a href="#UsageExamples">Usage Examples</a></p>
-<p class="TOCHeading"><A href="#References">References</A> </p>
- <H1><A name="Revision_History"></A>Revision History</H1>
- <TABLE width="100%">
- <TR>
- <TD class="TableHeading" width="25%"> Version </TD>
- <TD class="TableHeading" width="50%"> Version Information </TD>
- <TD class="TableHeading"> Date </TD>
- </TR>
- <TR>
- <TD class="TableCell" width="25%"> Initial version </TD>
- <TD class="TableCell" width="25%"> Nadya Morozova, Nikolay Kuznetsov: document
- created.</TD>
- <TD class="TableCell"> December 16, 2005</TD>
- </TR>
- </TABLE>
- <H1><A name="Disclaimer"></A>Disclaimer and Legal Information</H1>
-<P>Copyright 2005 The Apache Software Foundation or its licensors, as applicable.</P>
-<P>Licensed under the Apache License, Version 2.0 (the "License"); you
- may not use this file except in compliance with the License. You may obtain
- a copy of the License at <A href="http://www.apache.org/licenses/LICENSE-2.0">
- http://www.apache.org/licenses/LICENSE-2.0</A>. </P>
-<P>Unless required by applicable law or agreed to in writing, software distributed
- under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES
- OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
- specific language governing permissions and limitations under the License.</P>
- <h1><a name="About_this_document"></a>About This Document</h1>
- <h2><a name="Purpose"></a>Purpose</h2>
- <p>This document describes the <code>java.util.regex</code> package delivered as part of
- the DRL (Dynamic Run-time Layer) initiative. This document provides an overview
- of the overall architecture with focus on the aspects improving performance.
- </p>
- <h2><a name="Intended_Audience"></a>Intended Audience</h2>
-
-<p>The target audience for the document includes a wide community of engineers
- interested in using regular expression package and in further work with the
- product to contribute to its development. The document assumes that readers
- are familiar with Regular expressions [<a href="#Fri02">1</a>, <a href="#MY60">2</a>,
- <a href="#THO68">3</a>], finite automata theory [<a href="#ASU86">4</a>], basic
- compiler techniques [<a href="#ASU86">4</a>] and the Java<a
-href="#*">*</a> programming language [<a href="#JavaSite">5</a>]. </p>
- <h2><a name="Documentation_Conventions"></a>Documentation Conventions</h2>
- <p>This document uses the <a
-href="conventions.htm">unified conventions</a> for the DRL documentation kit.</p>
-<P class="backtotop"><A href="#Top">Back to Top</A></P>
-
-<h1><a name="Introduction"></a>Introduction to Pattern Matching </h1>
-
-<p>To analyze text in search of sequences matching preset patterns, you can choose
- from a variety of techniques, such as the wide-spread <i>regular expressions</i>
- (RE) [<a href="#Fri02">1</a>], <em>exact string matching</em>, for example, the Boyer-Moore algorithm
- (BM) [<a href="#BM77">6</a>], and others. However, the RE engine is rather complex, and significantly
- impacts performance, whereas the exact string matching techniques have a limited
- application. </p>
-
-<p>For example, the regular expression <code>.*world</code> is used to find the
- last instance of <i>world</i> in a sentence. The BM string searching algorithm is more
- appropriate for solving the task than the RE technique, and is more effective
- from the performance perspective. To optimize using pattern matching techniques
- with different tasks, DRL provides a unified interface that applies to any part
- of a regular expression, including the whole expression. </p>
-
-<p>In terms of the Finite Automata theory, which is the basis of regular expression
- processing, a part of regular expression is a <i>node</i>. The DRL regex framework
- treats every distinctive part of a regular expression and the whole expression
- as a node. Each node implements the unified interface adjusted for a specific
- technique. For instance, for the regular expression in the example, the DRL
- framework includes a special class SequenceSet, which has a unified interface
- called <code>AbstractSet</code>, and implements the Boyer-Moore algorithm in
- searching for a word.</p>
-<P class="backtotop"><A href="#Top">Back to Top</A></P>
- <h1><a name="AbstractSetIntro"></a>AbstractSet Interface</h1>
-
-<p>The key feature of the DRL regex framework the single super interface, <code>AbstractSet</code>,
- shared by all types of nodes. Individual nodes are independent, so that every
- node of the automaton incorporates its own find-match algorithm optimized for
- a specific type of nodes. You can use methods implemented in the <code>AbstractSet</code>
- interface subclasses or override these methods to use your own more efficient
- algorithm.</p>
-
- <h2><a name="Characteristics"></a>Characteristics</h2>
-
-<p>The <code>AbstractSet</code> interface has the following key characteristics:
-</p>
-
-<ul>
- <li> <b>Unified structure:</b> General contract for all nodes including node
- chains. This enables the framework to support new features and optimizations
- for a specific construction of RE without impacting other parts of the framework.
- <li><b>Simplicity:</b> Every node of an automaton is simple and handles a specific
- optimization or functionality. The resulting automaton is straight-forward
- and includes no multi-purpose functionality. To enable support of new functionality,
- create a new class.
- <p class="note">NOTE: </p>
- <p class="notetext">Because new features and optimizations are implemented
- independently, expansion of the framework has no negative impact on the
- overall performance. </p>
- <li><b>Independence</b>:The matching and finding procedures are incorporated
- into the interface and require no external logic. The interface allows using
- the same find-match methods for all types of nodes, and at the same time enables
- optimizing the find-match algorithm for individual node types to improve performance.
-</ul>
-<P class="backtotop"><A href="#Top">Back to Top</A></P>
-<h2><a name="MethodsUsed"></a>Methods Used </h2>
-
-<p>The <code>AbstractSet</code> interface defines the matching and finding methods,
- and the service methods supporting them, as follows. </p>
- <p class="class">Service Methods</p>
-<dl>
- <dt>accept()</dt>
- <dd>Checks whether the current node matches the current string symbol(s) to
- ensure that the automaton can proceed with the current state. The method returns
- the number of accepted symbols. This method is designed for nodes representing
- tokens, which mostly use the default match procedure. To optimize the match
- procedure for a specific type of token, you only need to override the <code>accept()</code>
- method, see <a href="#Example1">Example 1</a>.</dd>
- <dt>first()</dt>
- <dd>Checks whether the first symbol of the current node intersects with previous
- one. If not, then the previous node is quantified possessively without <i>backtrack</i>
- (the option of restarting the search from the previous position). </dd>
- <dt>next()</dt>
- <dd>Returns the next node of the automaton. </dd>
-</dl>
-<p class="class">Matching and Finding Methods</p>
-<dl>
- <dt>matches()</dt>
- <dd>Runs the match procedure starting from the current state. This is the only
- method you need to override in order to introduce new functionality. By default,
- the <code>matches()</code> method does the following when working with terms (see <a href="#ClassHierarchy">Class
- Hierarchy</a>):
- <ol>
- <li> Calls the <code>accept()</code> method and proceeds if the method returns
- a positive value. </li>
- <li> Calls the match of the next node with the shift obtained from the <code>accept()</code>
- method.</li>
- <li> Returns <code>TRUE</code> in case of a successful match. </li>
- </ol>
- </dd>
- <dt>find()</dt>
- <dd>Checks whether the pattern can match any part of the string in the following
- way:
- <ol>
- <li>Finds the next position in the input string, for which the <code>accept()</code>
- method returns a positive value. If no matches are found, the method terminates
- and returns a negative value.</li>
- <li>From this position, runs the <code>matches()</code> method.</li>
- </ol>
- </dd>
- <dt>findBack()</dt>
- <dd>Does the same as the <code>find()</code> method but<code></code> starts
- from the end of the string or from the nearest new-line symbol. This method
- optimizes the find procedure when the current node of the pattern fits the
- rest of the string. In such cases, it is necessary to find the last occurrence
- of the pattern represented by the next node, as in the case of the <code>.*world
- </code> pattern. </dd>
-</dl>
-<P class="backtotop"><A href="#Top">Back to Top</A></P>
- <h2><a name="ClassHierarchy"></a>Class Hierarchy</h2>
-
-<p><a href="#figure1">Figure 1</a> shows the class hierarchy based on the <code>AbstractSet</code>
- interface. As mentioned in its <a href="#AbstractSetIntro">description</a>,
- the whole hierarchy is based on this interface. The other classes of the framework
- are divided into the following categories: </p>
-
-<ul>
- <li><b>Tokens or Leafs</b> (<code>LeafSet</code> in the figure): nodes representing
- ordinal symbols, substrings, ranges, character classes, and other basic units
- of regular expressions.
- <li><b>Quantifiers</b> (<code>QuantifierSet</code> in the figure): set of nodes responsible
- for quantification of terms. Terms are simple and usually represent only one
- symbol. Therefore, terms are quantified without recursion, and backtracks are
- processed on the basis of the underlying pattern length.
- <li><b>Groups </b>(<code>JointSet</code> in the figure): sets derived from parenthetic constructions.
- These nodes represent alternations of other sets.
- <li><b>Group Quantifiers </b>(<code>GroupQuantifier</code> in the figure): special
- quantifiers over Groups. Because the length of a groups can vary, backtracks
- require recursion.
-</ul>
-<p align=center><img border=0 alt="Hierarchy of classes in regexp framework" src="images/picture1.gif"></p>
- <p class="special"><a name="figure1"></a>Figure 1. Class Hierarchy</p>
-
-<p>Figure 1 displays only the basics of the regex framework. This framework also
- includes other nodes responsible for different optimizations. For more details,
- see the comments inlined in code. </p>
-<P class="backtotop"><A href="#Top">Back to Top</A></P>
- <h2><a name="Optimizations"></a>Optimizations</h2>
-
-<p>In the current implementation, most optimizations are based on the node representation
- to improve efficiency of the framework. It is noteworthy that an optimal finite
- automaton is built during compile time, so that the constructed automaton does
- not spend additional time on decision-making overhead during match time. The
- regex framework optimizations improve different aspects, such as: </p>
-<ul>
- <li> <b>Character or String find methods: </b>The methods <code>find()</code>
- and <code>findBack()</code> of certain nodes use exact pattern matching algorithms,
- specifically, the Boyer-Moore fast string search algorithm.
- <li> <b>Quantified dot term:</b> This term (<code>.*</code>) covers the rest
- of the string and can therefore run <code>findBack()</code> from the end of
- the string. If the <code>findBack()</code> method of the following node implements
- a special algorithm, the pattern matching speed for this chain increases,
- otherwise the operation runs at the same speed.
- <li> <b>Quantified non-intersecting states:</b> If the quantified node does
- not intersect with the first character of the next one, the framework treats
- the quantifier as the possessive quantifier because no backtracks can occur.
- Possessive quantification is based on loops instead of recursion and works
- faster.
-</ul>
-<P class="backtotop"><A href="#Top">Back to Top</A></P>
-
-<h1><a name="UsageExamples"></a>Usage Examples</h1>
-<P>This part on the document illustrates usage of the DRL regex framework. </P>
-<h3><a name="Example1"></a>Example 1</h3>
-<p>This example illustrates using the <code>CharSet</code> class, which includes all nodes accepting characters to create a new type of tokens. For that, the class uses the <code>LeafSet</code> class, which is the basic class for tokens. This example shows that the only method you need to override in order to present a new type of tokens is the <code>accept()</code> method. </p>
-<pre>class CharSet extends LeafSet {
- ...
- public int accept(int strIndex, CharSequence testString) {
- // checks that the current symbol of the input string is the one this
- // instance represents and returns 1 (the number of
- // accepted symbols) or -1 if accept fails:
- return (this.ch == testString.charAt(strIndex)) ? 1 : -1;
- }
- ...
-}
-</pre>
-<h3>Example 2</h3>
-<p>The following example demonstrates independent implementation of the <code>find()</code> method for <code>SequenceSet</code>.</p>
-<p class="note">Note: </p>
-<p class="notetext">This changes the find procedure for nodes representing character
- sequences and at the same time does not affect the find-match algorithms of
- other types of nodes. </p>
-<pre>class SequenceSet extends LeafSet {
- ...
- protected int indexOf(CharSequence str, int from) {
- // Boyer-Moore algorithm
- ...
- }
-
- public int find(int strIndex, CharSequence testString,
- MatchResultImpl matchResult) {
- ...
- while (strIndex <= strLength) {
- // call the fast search metod instead of default implementation
- strIndex = indexOf(testStr, strIndex);
- if (strIndex < 0) {
- return -1;
- }
- if (next.matches(strIndex + charCount, testString, matchResult) >= 0) {
- return strIndex;
- }
- strIndex++;
- }
- return -1;
- }
- ...
-}
-</pre>
-
-
-<h3>Example 3 </h3>
-<p>This example illustrates how to turn the match procedure of the <code>.*</code>
- patterns into the find procedure, which is potentially faster. </p>
-
-<pre>class DotQuantifierSet extends LeafQuantifierSet {
- ...
- public int matches(int stringIndex, CharSequence testString,
- MatchResultImpl matchResult) {
- ...
- // find line terminator, since .* represented by this node accepts all characters
- // except line terminators
- int startSearch = findLineTerminator(stringIndex, strLength, testString);
- ...
- // run findBack method of the next node, because the previous
- // part of the string is accepted by .* and no matter where
- // the next part of pattern is found, the procedure works OK.
- return next.findBack(stringIndex, startSearch, testString, matchResult);
- }
-}
-</pre>
-<P class="backtotop"><A href="#Top">Back to Top</A></P>
-
-<h1><a name="References"></a>References</h1>
- <p>[<a name="Fri02"></a>1] Jeffrey E. F. Friedl., <em>Mastering regular expressions</em>,
- 2nd Edition., July 2002, O'Reilly, ISBN 0-596-00289-0 </p>
-<p>[<a name="MY60"></a>2] McNaughton, R. and Yamada, H. <em>Regular Expressions
- and State Graphs for Automata</em>, IRA Trans. on Electronic Computers, Vol.
- EC-9, No. 1, Mar. 1960, pp 39-47. </p>
-
-<p>[<a name="THO68"></a>3] Thompson, K., <em>Regular Expression search Algorithm</em>,
- Communication ACM 11:6 (1968), pp. 419-422. </p>
-
-<p>[<a name="ASU86"></a>4] Alfred V. Aho, Ravi Sethi, Jeffrey
- D. Ullman, <em>Compilers, Principles Techniques and Tools</em>, Addison-Wesley
- Publishing Company, Inc., 1985, ISBN 0-201-10088-6 </p>
-
-<p>[<a name="JavaSite"></a>5] Java Technology site, <a href="http://java.sun.com" target="_blank">http://java.sun.com</a>
-</p>
-
-<p>[<a name="BM77"></a>6] R. Boyer and S. Moore. A fast string searching algorithm. C. ACM, 20:762-772, 1977. </p>
-
-<p>[<a name="KMP77"></a>7] D.E. Knuth, .I. Morris, and V. Pratt. Fast pattern
- matching in strings. SIAM J on Computing, 6:323-350, 1977. </p>
-<P class="backtotop"><A href="#Top">Back to Top</A></P>
-<P class="backtotop"> </P>
-
-
-<P><A name="*">*</A> Other brands and names are the property of their respective
- owners.</P>
-</body>
-</html>
+<!DOCTYPE html public "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
+<LINK rel="Stylesheet" type="text/css" media="all" href="drl.css">
+<title>Design of the regex processing framework</title>
+</head>
+<body>
+
+ <h1 align=center><a name="Top"></a>Regex Processing Framework</h1>
+ <p class="TOCHeading"><A href="#Revision_History">Revision History</A></p>
+ <p class="TOCHeading"><A href="#Disclaimer">Disclaimer</A></p>
+ <p class="TOCHeading"><A href="#About_this_document">About this Document</A>
+ </p>
+ <p class="TOC"><A href="#Purpose">Purpose </A> </p>
+ <p class="TOC"><A href="#Intended_Audience">Intended Audience </A> </p>
+ <p class="TOC"><A href="#Documentation_Conventions">Documentation Conventions</A></p>
+ <p class="TOCHeading"><a href="#Introduction">Introduction to Pattern Matching</a></p>
+ <p class="TOCHeading"><a href="#AbstractSetIntro">AbstractSet Interface</a></p>
+ <p class="TOC"><a href="#Characteristics">Characteristics</a></p>
+ <p class="TOC"><a href="#MethodsUsed">Methods Used</a></p>
+ <p class="TOC"><a href="#ClassHierarchy">Class Hierarchy</a></p>
+ <p class="TOC"><a href="#Optimizations">Optimizations</a></p>
+
+<p class="TOCHeading"><a href="#UsageExamples">Usage Examples</a></p>
+<p class="TOCHeading"><A href="#References">References</A> </p>
+ <H1><A name="Revision_History"></A>Revision History</H1>
+ <TABLE width="100%">
+ <TR>
+ <TD class="TableHeading" width="25%"> Version </TD>
+ <TD class="TableHeading" width="50%"> Version Information </TD>
+ <TD class="TableHeading"> Date </TD>
+ </TR>
+ <TR>
+ <TD class="TableCell" width="25%"> Initial version </TD>
+ <TD class="TableCell" width="25%"> Nadya Morozova, Nikolay Kuznetsov: document
+ created.</TD>
+ <TD class="TableCell"> December 16, 2005</TD>
+ </TR>
+ </TABLE>
+ <H1><A name="Disclaimer"></A>Disclaimer and Legal Information</H1>
+<P>Copyright 2005 The Apache Software Foundation or its licensors, as applicable.</P>
+<P>Licensed under the Apache License, Version 2.0 (the "License"); you
+ may not use this file except in compliance with the License. You may obtain
+ a copy of the License at <A href="http://www.apache.org/licenses/LICENSE-2.0">
+ http://www.apache.org/licenses/LICENSE-2.0</A>. </P>
+<P>Unless required by applicable law or agreed to in writing, software distributed
+ under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES
+ OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations under the License.</P>
+ <h1><a name="About_this_document"></a>About This Document</h1>
+ <h2><a name="Purpose"></a>Purpose</h2>
+ <p>This document describes the <code>java.util.regex</code> package delivered as part of
+ the DRL (Dynamic Run-time Layer) initiative. This document provides an overview
+ of the overall architecture with focus on the aspects improving performance.
+ </p>
+ <h2><a name="Intended_Audience"></a>Intended Audience</h2>
+
+<p>The target audience for the document includes a wide community of engineers
+ interested in using regular expression package and in further work with the
+ product to contribute to its development. The document assumes that readers
+ are familiar with Regular expressions [<a href="#Fri02">1</a>, <a href="#MY60">2</a>,
+ <a href="#THO68">3</a>], finite automata theory [<a href="#ASU86">4</a>], basic
+ compiler techniques [<a href="#ASU86">4</a>] and the Java<a
+href="#*">*</a> programming language [<a href="#JavaSite">5</a>]. </p>
+ <h2><a name="Documentation_Conventions"></a>Documentation Conventions</h2>
+ <p>This document uses the <a
+href="conventions.htm">unified conventions</a> for the DRL documentation kit.</p>
+<P class="backtotop"><A href="#Top">Back to Top</A></P>
+
+<h1><a name="Introduction"></a>Introduction to Pattern Matching </h1>
+
+<p>To analyze text in search of sequences matching preset patterns, you can choose
+ from a variety of techniques, such as the wide-spread <i>regular expressions</i>
+ (RE) [<a href="#Fri02">1</a>], <em>exact string matching</em>, for example, the Boyer-Moore algorithm
+ (BM) [<a href="#BM77">6</a>], and others. However, the RE engine is rather complex, and significantly
+ impacts performance, whereas the exact string matching techniques have a limited
+ application. </p>
+
+<p>For example, the regular expression <code>.*world</code> is used to find the
+ last instance of <i>world</i> in a sentence. The BM string searching algorithm is more
+ appropriate for solving the task than the RE technique, and is more effective
+ from the performance perspective. To optimize using pattern matching techniques
+ with different tasks, DRL provides a unified interface that applies to any part
+ of a regular expression, including the whole expression. </p>
+
+<p>In terms of the Finite Automata theory, which is the basis of regular expression
+ processing, a part of regular expression is a <i>node</i>. The DRL regex framework
+ treats every distinctive part of a regular expression and the whole expression
+ as a node. Each node implements the unified interface adjusted for a specific
+ technique. For instance, for the regular expression in the example, the DRL
+ framework includes a special class SequenceSet, which has a unified interface
+ called <code>AbstractSet</code>, and implements the Boyer-Moore algorithm in
+ searching for a word.</p>
+<P class="backtotop"><A href="#Top">Back to Top</A></P>
+ <h1><a name="AbstractSetIntro"></a>AbstractSet Interface</h1>
+
+<p>The key feature of the DRL regex framework the single super interface, <code>AbstractSet</code>,
+ shared by all types of nodes. Individual nodes are independent, so that every
+ node of the automaton incorporates its own find-match algorithm optimized for
+ a specific type of nodes. You can use methods implemented in the <code>AbstractSet</code>
+ interface subclasses or override these methods to use your own more efficient
+ algorithm.</p>
+
+ <h2><a name="Characteristics"></a>Characteristics</h2>
+
+<p>The <code>AbstractSet</code> interface has the following key characteristics:
+</p>
+
+<ul>
+ <li> <b>Unified structure:</b> General contract for all nodes including node
+ chains. This enables the framework to support new features and optimizations
+ for a specific construction of RE without impacting other parts of the framework.
+ <li><b>Simplicity:</b> Every node of an automaton is simple and handles a specific
+ optimization or functionality. The resulting automaton is straight-forward
+ and includes no multi-purpose functionality. To enable support of new functionality,
+ create a new class.
+ <p class="note">NOTE: </p>
+ <p class="notetext">Because new features and optimizations are implemented
+ independently, expansion of the framework has no negative impact on the
+ overall performance. </p>
+ <li><b>Independence</b>:The matching and finding procedures are incorporated
+ into the interface and require no external logic. The interface allows using
+ the same find-match methods for all types of nodes, and at the same time enables
+ optimizing the find-match algorithm for individual node types to improve performance.
+</ul>
+<P class="backtotop"><A href="#Top">Back to Top</A></P>
+<h2><a name="MethodsUsed"></a>Methods Used </h2>
+
+<p>The <code>AbstractSet</code> interface defines the matching and finding methods,
+ and the service methods supporting them, as follows. </p>
+ <p class="class">Service Methods</p>
+<dl>
+ <dt>accept()</dt>
+ <dd>Checks whether the current node matches the current string symbol(s) to
+ ensure that the automaton can proceed with the current state. The method returns
+ the number of accepted symbols. This method is designed for nodes representing
+ tokens, which mostly use the default match procedure. To optimize the match
+ procedure for a specific type of token, you only need to override the <code>accept()</code>
+ method, see <a href="#Example1">Example 1</a>.</dd>
+ <dt>first()</dt>
+ <dd>Checks whether the first symbol of the current node intersects with previous
+ one. If not, then the previous node is quantified possessively without <i>backtrack</i>
+ (the option of restarting the search from the previous position). </dd>
+ <dt>next()</dt>
+ <dd>Returns the next node of the automaton. </dd>
+</dl>
+<p class="class">Matching and Finding Methods</p>
+<dl>
+ <dt>matches()</dt>
+ <dd>Runs the match procedure starting from the current state. This is the only
+ method you need to override in order to introduce new functionality. By default,
+ the <code>matches()</code> method does the following when working with terms (see <a href="#ClassHierarchy">Class
+ Hierarchy</a>):
+ <ol>
+ <li> Calls the <code>accept()</code> method and proceeds if the method returns
+ a positive value. </li>
+ <li> Calls the match of the next node with the shift obtained from the <code>accept()</code>
+ method.</li>
+ <li> Returns <code>TRUE</code> in case of a successful match. </li>
+ </ol>
+ </dd>
+ <dt>find()</dt>
+ <dd>Checks whether the pattern can match any part of the string in the following
+ way:
+ <ol>
+ <li>Finds the next position in the input string, for which the <code>accept()</code>
+ method returns a positive value. If no matches are found, the method terminates
+ and returns a negative value.</li>
+ <li>From this position, runs the <code>matches()</code> method.</li>
+ </ol>
+ </dd>
+ <dt>findBack()</dt>
+ <dd>Does the same as the <code>find()</code> method but<code></code> starts
+ from the end of the string or from the nearest new-line symbol. This method
+ optimizes the find procedure when the current node of the pattern fits the
+ rest of the string. In such cases, it is necessary to find the last occurrence
+ of the pattern represented by the next node, as in the case of the <code>.*world
+ </code> pattern. </dd>
+</dl>
+<P class="backtotop"><A href="#Top">Back to Top</A></P>
+ <h2><a name="ClassHierarchy"></a>Class Hierarchy</h2>
+
+<p><a href="#figure1">Figure 1</a> shows the class hierarchy based on the <code>AbstractSet</code>
+ interface. As mentioned in its <a href="#AbstractSetIntro">description</a>,
+ the whole hierarchy is based on this interface. The other classes of the framework
+ are divided into the following categories: </p>
+
+<ul>
+ <li><b>Tokens or Leafs</b> (<code>LeafSet</code> in the figure): nodes representing
+ ordinal symbols, substrings, ranges, character classes, and other basic units
+ of regular expressions.
+ <li><b>Quantifiers</b> (<code>QuantifierSet</code> in the figure): set of nodes responsible
+ for quantification of terms. Terms are simple and usually represent only one
+ symbol. Therefore, terms are quantified without recursion, and backtracks are
+ processed on the basis of the underlying pattern length.
+ <li><b>Groups </b>(<code>JointSet</code> in the figure): sets derived from parenthetic constructions.
+ These nodes represent alternations of other sets.
+ <li><b>Group Quantifiers </b>(<code>GroupQuantifier</code> in the figure): special
+ quantifiers over Groups. Because the length of a groups can vary, backtracks
+ require recursion.
+</ul>
+<p align=center><img border=0 alt="Hierarchy of classes in regexp framework" src="images/picture1.gif"></p>
+ <p class="special"><a name="figure1"></a>Figure 1. Class Hierarchy</p>
+
+<p>Figure 1 displays only the basics of the regex framework. This framework also
+ includes other nodes responsible for different optimizations. For more details,
+ see the comments inlined in code. </p>
+<P class="backtotop"><A href="#Top">Back to Top</A></P>
+ <h2><a name="Optimizations"></a>Optimizations</h2>
+
+<p>In the current implementation, most optimizations are based on the node representation
+ to improve efficiency of the framework. It is noteworthy that an optimal finite
+ automaton is built during compile time, so that the constructed automaton does
+ not spend additional time on decision-making overhead during match time. The
+ regex framework optimizations improve different aspects, such as: </p>
+<ul>
+ <li> <b>Character or String find methods: </b>The methods <code>find()</code>
+ and <code>findBack()</code> of certain nodes use exact pattern matching algorithms,
+ specifically, the Boyer-Moore fast string search algorithm.
+ <li> <b>Quantified dot term:</b> This term (<code>.*</code>) covers the rest
+ of the string and can therefore run <code>findBack()</code> from the end of
+ the string. If the <code>findBack()</code> method of the following node implements
+ a special algorithm, the pattern matching speed for this chain increases,
+ otherwise the operation runs at the same speed.
+ <li> <b>Quantified non-intersecting states:</b> If the quantified node does
+ not intersect with the first character of the next one, the framework treats
+ the quantifier as the possessive quantifier because no backtracks can occur.
+ Possessive quantification is based on loops instead of recursion and works
+ faster.
+</ul>
+<P class="backtotop"><A href="#Top">Back to Top</A></P>
+
+<h1><a name="UsageExamples"></a>Usage Examples</h1>
+<P>This part on the document illustrates usage of the DRL regex framework. </P>
+<h3><a name="Example1"></a>Example 1</h3>
+<p>This example illustrates using the <code>CharSet</code> class, which includes all nodes accepting characters to create a new type of tokens. For that, the class uses the <code>LeafSet</code> class, which is the basic class for tokens. This example shows that the only method you need to override in order to present a new type of tokens is the <code>accept()</code> method. </p>
+<pre>class CharSet extends LeafSet {
+ ...
+ public int accept(int strIndex, CharSequence testString) {
+ // checks that the current symbol of the input string is the one this
+ // instance represents and returns 1 (the number of
+ // accepted symbols) or -1 if accept fails:
+ return (this.ch == testString.charAt(strIndex)) ? 1 : -1;
+ }
+ ...
+}
+</pre>
+<h3>Example 2</h3>
+<p>The following example demonstrates independent implementation of the <code>find()</code> method for <code>SequenceSet</code>.</p>
+<p class="note">Note: </p>
+<p class="notetext">This changes the find procedure for nodes representing character
+ sequences and at the same time does not affect the find-match algorithms of
+ other types of nodes. </p>
+<pre>class SequenceSet extends LeafSet {
+ ...
+ protected int indexOf(CharSequence str, int from) {
+ // Boyer-Moore algorithm
+ ...
+ }
+
+ public int find(int strIndex, CharSequence testString,
+ MatchResultImpl matchResult) {
+ ...
+ while (strIndex <= strLength) {
+ // call the fast search method instead of default implementation
+ strIndex = indexOf(testStr, strIndex);
+ if (strIndex < 0) {
+ return -1;
+ }
+ if (next.matches(strIndex + charCount, testString, matchResult) >= 0) {
+ return strIndex;
+ }
+ strIndex++;
+ }
+ return -1;
+ }
+ ...
+}
+</pre>
+
+
+<h3>Example 3 </h3>
+<p>This example illustrates how to turn the match procedure of the <code>.*</code>
+ patterns into the find procedure, which is potentially faster. </p>
+
+<pre>class DotQuantifierSet extends LeafQuantifierSet {
+ ...
+ public int matches(int stringIndex, CharSequence testString,
+ MatchResultImpl matchResult) {
+ ...
+ // find line terminator, since .* represented by this node accepts all characters
+ // except line terminators
+ int startSearch = findLineTerminator(stringIndex, strLength, testString);
+ ...
+ // run findBack method of the next node, because the previous
+ // part of the string is accepted by .* and no matter where
+ // the next part of pattern is found, the procedure works OK.
+ return next.findBack(stringIndex, startSearch, testString, matchResult);
+ }
+}
+</pre>
+<P class="backtotop"><A href="#Top">Back to Top</A></P>
+
+<h1><a name="References"></a>References</h1>
+ <p>[<a name="Fri02"></a>1] Jeffrey E. F. Friedl., <em>Mastering regular expressions</em>,
+ 2nd Edition., July 2002, O'Reilly, ISBN 0-596-00289-0 </p>
+<p>[<a name="MY60"></a>2] McNaughton, R. and Yamada, H. <em>Regular Expressions
+ and State Graphs for Automata</em>, IRA Trans. on Electronic Computers, Vol.
+ EC-9, No. 1, Mar. 1960, pp 39-47. </p>
+
+<p>[<a name="THO68"></a>3] Thompson, K., <em>Regular Expression search Algorithm</em>,
+ Communication ACM 11:6 (1968), pp. 419-422. </p>
+
+<p>[<a name="ASU86"></a>4] Alfred V. Aho, Ravi Sethi, Jeffrey
+ D. Ullman, <em>Compilers, Principles Techniques and Tools</em>, Addison-Wesley
+ Publishing Company, Inc., 1985, ISBN 0-201-10088-6 </p>
+
+<p>[<a name="JavaSite"></a>5] Java Technology site, <a href="http://java.sun.com" target="_blank">http://java.sun.com</a>
+</p>
+
+<p>[<a name="BM77"></a>6] R. Boyer and S. Moore. A fast string searching algorithm. C. ACM, 20:762-772, 1977. </p>
+
+<p>[<a name="KMP77"></a>7] D.E. Knuth, .I. Morris, and V. Pratt. Fast pattern
+ matching in strings. SIAM J on Computing, 6:323-350, 1977. </p>
+<P class="backtotop"><A href="#Top">Back to Top</A></P>
+<P class="backtotop"> </P>
+
+
+<P><A name="*">*</A> Other brands and names are the property of their respective
+ owners.</P>
+</body>
+</html>
Modified: incubator/harmony/enhanced/classlib/trunk/modules/security/src/test/java/common/java/security/IdentityTest.java
URL: http://svn.apache.org/viewcvs/incubator/harmony/enhanced/classlib/trunk/modules/security/src/test/java/common/java/security/IdentityTest.java?rev=392237&r1=392236&r2=392237&view=diff
==============================================================================
--- incubator/harmony/enhanced/classlib/trunk/modules/security/src/test/java/common/java/security/IdentityTest.java (original)
+++ incubator/harmony/enhanced/classlib/trunk/modules/security/src/test/java/common/java/security/IdentityTest.java Fri Apr 7 03:17:21 2006
@@ -303,7 +303,7 @@
}
/**
- * verify Identity.toString(boolean) return string represention of identity
+ * verify Identity.toString(boolean) return string representation of identity
*/
public void testToStringboolean() throws Exception {
new IdentityStub("aaa").toString(false);