As with sorting, we can take advantage of properties of strings to develop search methods (symboltable implementations) that can be more efficient than the generalpurpose methods for typical applications where search keys are strings.
Specifically, the methods that we consider in this section achieve the following performance characteristics in typical applications, even for huge tables:
 Search hits take time proportional to the length of the search key
 Search misses involve examining only a few characters
On reflection, these performance characteristics are quite remarkable, one of the crowning achievements of algorithmic technology and a primary factor in enabling the development of the computational infrastructure we now enjoy that has made so much information instantly accessible. Moreover, we can extend the symboltable API to include characterbased operations defined for string keys (but not necessarily for all Comparable
types of keys) that are powerful and quite useful in practice, as in the following API:


As we saw for sorting with string keys, it is often quite important to be able to work with strings from a specified alphabet. Simple and efficient implementations that are the method of choice for small alphabets turn out to be useless for large alphabets because they consume too much space. In such cases, it is certainly worthwhile to add a constructor that allows clients to specify the alphabet. We will consider the implementation of such a constructor later in this section but omit it from the API for now, in order to concentrate on string keys.
The following descriptions of the three new methods use the keys {she sells sea shells by the sea shore}
to give examples:
longestPrefixOf()
takes a string as argument and returns the longest key in the symbol table that is a prefix of that string. For the keys above,longestPrefixOf("shell")
is she andlongestPrefixOf("shellsort")
isshells
keysWithPrefix()
takes a string as argument and returns all the keys in the symbol table having that string as prefix. For the keys above,keysWithPrefix("she")
isshe
andshells
, andkeysWithPrefix("se")
issells
andsea
keysThatMatch()
takes a string as argument and returns all the keys in the symbol table that match that string, in the sense that a period (.) in the argument string matches any character. For the keys above,keysThatMatch(".he")
returnsshe
andthe
, andkeysThatMatch("s..")
returnsshe
andsea
Tries
In this section, we consider a search tree known as a trie, a data structure built from the characters of the string keys that allows us to use the characters of the search key to guide the search. The name “trie” is a bit of wordplay introduced by E. Fredkin in 1960 because the data structure is used for retrieval, but we pronounce it “try” to avoid confusion with “tree.”
Basic properties
As with search trees, tries are data structures composed of nodes that contain links that are either null or references to other nodes. Each node is pointed to by just one other node, which is called its parent (except root), and each node has R links, where R is the alphabet size.
Each node also has a corresponding value, which may be null or the value associated with one of the string keys in the symbol table. Specifically, we store the value associated with each key in the node corresponding to its last character. It’s very important to bear in mind the following fact: nodes with null values exist to facilitate search in the trie and do not correspond to keys.


Search in a trie
Finding the value associated with a given string key in a trie is a simple process, guided by the characters in the search key. Each node in the trie has a link corresponding to each possible string character. We start at the root, then follow the link associated with the first character in the key; from that node we follow the link associated with the second character in the key; from that node we follow the link associated with the third character in the key and so forth, until reaching the last character of the key or a null
link.
 The value at the node corresponding to the last character in the key is not
null
(as in the searches for shells and she depicted at left above). This result is a search hit — the value associated with the key is the value in the node corresponding to its last character  The value in the node corresponding to the last character in the key is
null
(as in the search for shell depicted at top right above). This result is a search miss: the key is not in the table  The search terminated with a
null
link (as in the search for shore depicted at bottom right above). This result is also a search miss


Insertion into a trie
As with binary search trees, we insert by first doing a search: in a trie that means using the characters of the key to guide us down the trie until reaching the last character of the key or a null
link. At this point, one of the following two conditions holds:
 We encountered a
null
link before reaching the last character of the key. In this case, there is no trie node corresponding to the last character in the key, so we need to create nodes for each of the characters in the key not yet encountered and set the value in the last one to the value to be associated with the key  We encountered the last character of the key before reaching a
null
link. In this case, we set that node’s value to the value to be associated with the key (whether or not that value isnull
), as usual with our associative array convention


Node representation
As mentioned at the outset, our trie diagrams do not quite correspond to the data structures our programs will build, because we do not draw null
links. Taking null
links into account emphasizes the following important characteristics of tries:
 Every node has R links, one for each possible character
 Characters and keys are implicitly stored in the data structure


Size
As for the binary search trees, three straightforward options are available for implementing size()
:
 An eager implementation where we maintain the number of keys in an instance variable N
 A very eager implementation where we maintain the number of keys in a subtrie as a node instance variable that we update after the recursive calls in
put()
anddelete()
 A lazy recursive implementation like the one at right. It traverses all of the nodes in the trie, counting the number having a nonnull value


Collecting keys


Wildcard match


Longest prefix


Deletion
The first step needed to delete a keyvalue pair from a trie is to use a normal search to find the node corresponding to the key and set the corresponding value to null
.
 If that node has a non
null
link to a child, then no more work is required  If all the links are
null
, we need to remove the node from the data structure  If doing so leaves all the links
null
in its parent, we need to remove that node, and so forth


Properties of tries
Proposition The linked structure (shape) of a trie is independent of the key insertion/deletion order: there is a unique trie for any given set of keys.
Worstcase time bound for search and insert
Proposition The number of array access when searching in a trie or inserting a key into a trie is at most 1 plus the length of the key.
Expected time bound for search miss
Proposition The average number of nodes examined for search miss in a trie built from N random keys over an alphabet of size R is ~logRN.
From a practical standpoint, the most important implication of this proposition is that search miss does not depend on the key length. For example, it says that unsuccessful search in a trie built with 1 million random keys will require examining only three or four nodes, whether the keys are 7digit license plates or 20digit account numbers.
Space
Proposition The number of links in a trie is between RN and RNw, where w is the average key length.
The table on the facing page shows the costs for some typical applications that we have considered. It illustrates the following rules of thumb for tries:
 When keys are short, the number of links is close to RN
 When keys are long, the number of links is close to RNw
 Therefore, decreasing R can save a huge amount of space
Oneway branching
The primary reason that trie space is excessive for long keys is that long keys tend to have long tails in the trie, with each node having a single link to the next node (and, therefore, R−1 null links). This situation known as external oneway branching is not difficult to correct. A trie might also have internal oneway branching. For example, two long keys may be equal except for their last character. This situation is a bit more difficult to address.
These changes can make trie space usage a less important factor than for the straightforward implementation that we have considered, but they are not necessarily effective in practical applications. Next, we consider an alternative approach to reducing space usage for tries.
THE BOTTOM LINE is this: do not try to use the sample algorithm for large numbers of long keys taken from large alphabets, because it will require space proportional to R times the total number of key characters. Otherwise, if you can afford the space, trie performance is difficult to beat.
Tenary Search Tries (TSTs)
To help us avoid the excessive space cost associated with Rway tries, we now consider an alternative representation: the ternary search trie (TST). In a TST, each node has a character, three links, and a value. The three links correspond to keys whose current characters are less than, equal to, or greater than the node’s character.
In the Rway tries, trie nodes are represented by R links, with the character corresponding to each nonnull link implictly represented by its index. In the corresponding TST, characters appear explicitly in nodes — we find characters corresponding to keys only when we are traversing the middle links.


Properties of TSTs
Space
Proposition The number of links in a TST built from N string keys of average length w is between 3N and 3Nw.
Search cost
Proposition A search miss in a TST built from N random string keys requires ~lnN character compares on the average. A search hit or an insertion in a TST uses ~lnN+L character compares, where L is the length of the search key.
Summary
As with string sorting, we are naturally interested in how the stringsearching methods that we have considered compare to the generalpurpose methods that we considered in CHAPTER 3. The following table summarizes the important characteristics of the algorithms that we have discussed in this section (the rows for BSTs, redblack BSTs and hashing are included from CHAPTER 3, for comparison). For a particular application, these entries must be taken as indicative, not definitive, since so many factors (such as characteristics of keys and mix of operations) come into play when studying symboltable implementations.


 If space is available, Rway tries provide the fastest search, essentially completing the job with a constant number of character compares
 For large alphabets, where space may not be available for Rway tries, TSTs are preferable, since they use a logarithmic number of character compares, while BSTs use a logarithmic number of key compares
 Hashing can be competitive, but, as usual, cannot support ordered symboltable operations or extended characterbased API operations such as prefix or wildcard match