You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Nicola Ken Barozzi <ni...@apache.org> on 2002/12/05 09:12:48 UTC

[Fwd: Re: [Collections] [SUBMIT] Trie]

In case someone here sees some use of this for our URI handling...

-------- Original Message --------
Subject: Re: [Collections] [SUBMIT] Trie
Date: Thu, 5 Dec 2002 14:31:28 +1300 (NZDT)
From: Rich Dougherty <ri...@rd.gen.nz>
To: <co...@jakarta.apache.org>


>> I've written an implementation for a trie which I'd like to
>> contribute. I've attached a simple implementation and a test
>> case.
>
> I've taken a quick look at the code here, and that all looks fine.
>
> Unfortunately, I don't really know what I'm looking at! What I mean is
> that I've never heard of a 'Trie' before, and am thus wondering as to
> why I would use it, and is it general enough to be in [collections].
>
> In the past we have had discussions about Tree implementations (and I
> have coded some before). This may be related, as the Trie appears to be
> a recursive structure.
>
> Perhaps, some use cases as to why a Trie is useful, especially in a
> general/server environment would be useful. Thanks.

That sounds reasonable to me. :-)

I wrote this trie to solve a specific problem for me. In a web
application I'm writing I have a set or URIs that map to a handler for
a request. I have mappings like:

   / -> home page
   /article -> article list page
   /article/view -> view article page
   /article/edit -> edit article page
   /auth/login -> login page
   /auth/logout -> logout page

I wanted a data structure to store these mappings. I have written a
class called SimplifiedURITrie which extends AbstractTransitionMapTrie
to store mappings of these type.

I submitted StringTrie as a concrete example of the Trie interface. I
haven't submitted SimplifiedURITrie since I felt it was very specific
to my application, but if you're interseted I've attached it. If you
want to include it in Commons (which I doubt), then I'm happy to tidy
it up a bit.

Back to my application. A trie that held these mappings would look
like this:

+ -> home page
|
+-article-+ -> article list page
|         |
|         +-view-+ -> view article page
|         |
|         +-edit-+ -> edit article page
|
+-auth-+
        |
        +-login-+ -> login page
        |
        +-logout-+ -> logout page

Notice how the key (the URI) is broken into parts with one part being
stored on each transition. All keys that share a common prefix hang
off a particular node. See the diagram at
http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Tree/Trie.html for an
example with strings.

For my application a trie was useful for a couple of reasons:

- Space efficiency. Each part of the key is only stored once. I would
   only need to store the transition "article" once even if I had
   hundreds of URLs starting with "/article".

   The actual space efficiency of a trie depends on how efficiently the
   transitions are stored, and the nature of the data stored in it. A
   dictionary of strings could be stored in a trie quite
   efficiently. However, I wouldn't suggest the current StringTrie
   implementation be used for this reason. It uses a HashMap for
   storing the transitions from each node, which means it uses more
   space than is strictly necessary.

- Can efficiently find the entries whose keys are the prefix of a
   given value. The method getLongest() returns the longest of these
   keys. This is useful for me in my URI matching problem. If
   someone gives me the URI "/article/view/43" I can search the trie
   and find that the best match is "/article/view". (I'll pass the
   "43" along to my handler so it can show the appropriate article.)

I also added the method prefixEntrySet() to the interface because it
was efficient to perform and seemed like it might be useful. Prompted
by your question I did a bit of searching and found that it would be
handy for substring matching using a "suffix trie":

   http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Tree/Suffix/

Cool huh.

I'm not really a trie zealot, but I think a trie is a reasonably
useful data structure. I couldn't find an implementation with a
suitable license anywhere, so I wrote my own. I found an
implementation in a servlet container written by someone at HP (open
source, with heavy restrictions) and one in LimeWire (GPL, not LGPL or
Apache). Also many implementations were too limited for me. Most
operated only on Strings or byte arrays, I wanted something more
general. I hope that helps.

Rich



-- 
Nicola Ken Barozzi                   nicolaken@apache.org
             - verba volant, scripta manent -
    (discussions get forgotten, just code remains)
---------------------------------------------------------------------