You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by "Ken Giusti (Jira)" <ji...@apache.org> on 2020/06/01 21:43:00 UTC
[jira] [Commented] (DISPATCH-1682) Optimize the parse tree match algorithm to avoid O(N) lookup

    [ https://issues.apache.org/jira/browse/DISPATCH-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121354#comment-17121354 ] 

Ken Giusti commented on DISPATCH-1682:
--------------------------------------

A parse tree maps a token separated _pattern_ into a directed graph _tree_.

A pattern consists of an ordered sequence of one or more _tokens_
separated by a reserved _separator character_.

Example:

pattern = T0.T1.T2

Where Tn is a token and '.' is the separator character. '/' is also
considered a separator character but for this example we will use '.'.

A parse tree pattern may also include wildcard tokens. Two wildcard
tokens are defined:

'*': Match one ('star'). Will match exactly one token.

'#': Match zero or more ('glob'). There are two constraints on glob
tokens: a glob may appear at the end of the pattern, or it may be
followed by a non-wildcard token.

The tree expands all patterns into a sequence of nodes. The tree's
root node holds the first (leftmost) token of the pattern. Each
following token is added as a descendant of the node that contains the
token immediately to its left.

The pattern's path is the sequence of tokens from the tree root to the
last token of the pattern.

A hash table will be paired with the tree for use when matching
addresses against the tree's patterns.

A reference to each node in the tree will be added to the hash table
as patterns are inserted into the tree. The hash key for each node
will consist of the portion of the pattern up to and including the
current token.

Wildcard tokens require special handling when constructing this hash
key. Instead of incorporating the wildcard token into the hash key, a
separator character will be substituted. If the wildcard is star then
the '.' separator will be used. Glob will use the '/' separator. The
reason behind this will become apparent later.

For example, given the following patterns:

X.Y.Z
X.A.*.B
X.Y.Z.C

Would result in a tree with the following nodes:

ROOT:
 [token='X', hash_key='X']

X children:
 [token='Y', hash_key='X.Y']
 [token='A', hash_key='X.A']

X.Y children:
 [token='Z', hash_key='X.Y.Z']

Y.Z children:
 [token='C', hash_key='X.Y.Z.C']

X.A children:
 [token='*', hash_key='X.A..']

A.* children:
 [token='B', hash_key='X.A...B']


+Address look-ups+


To look up an address in the tree the address will first be token-ized.

A search key is created and initialized to the first token of the
address. This key is then used in a hash lookup. If there is a match
then the next token in the address will be appended to the search key
and another hash lookup will be done. This process continues until
all the address tokens are consumed (current node is the match), or a
hash lookup fails.

If the lookup fails, the current node (last node that matched
successfully) will be checked if it has a star or glob child node. If
there is no star or glob child the address lookup has failed.

The next steps depend on whether there is a star child, glob child or both.

If there is a star child it takes precedence. For the star node we
want to ignore a single address token. The last token from the address
is removed from the end of the search key and discarded. A '.' is
appended to the search key and the current node pointer is advanced to
the star child node. The next address token is appended to the search
key and the search continues.


If there is no star, or the star lookup failed, then check if there is
a glob child. For a glob we must discard zero or more address tokens
until there is a match or we run out of tokens (matches the glob
node). First the current node is advanced to the glob child node.
Then the last token in the search key is removed, a '/' is added, the
last token is appended after the '/'. For example a key of "foo.bar"
is changed to "foo./.bar" (this is the 'match zero' glob case). A
hash lookup is then done.

If this lookup fails then the current token is discarded and replaced
in the search key with the next token from the address. This process
continues until there is either a match or the address runs out of all
tokens. If the search runs out of tokens then the current node (the
glob node) is the matching node.

 

 

> Optimize the parse tree match algorithm to avoid O(N) lookup
> ------------------------------------------------------------
>
>                 Key: DISPATCH-1682
>                 URL: https://issues.apache.org/jira/browse/DISPATCH-1682
>             Project: Qpid Dispatch
>          Issue Type: Bug
>          Components: Routing Engine
>    Affects Versions: 1.12.0
>            Reporter: Ken Giusti
>            Assignee: Ken Giusti
>            Priority: Major
>             Fix For: Backlog
>
>
> The parse tree pattern match algorithm is optimized to search using a key that is made up of a sequence of tokens.
> If all keys inserted into the parse tree are only single tokens then the lookup degrades into a linear list search. 
> The treatment for every address is looked up in the parse tree.  If the address is not found the default treatment is used.  Lookups that miss end up performing O(N) searches.
> The match algorithm should be re-designed to avoid O(N) searches for single token patterns.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org