Chapter 11: Hash Tables

Contents

11 Hash Tables

11 Hash Tables

11.1 Direct-address tables

Direct addressing is a simple technique that works well when the universe U of keys is reasonably small.

DIRECT-ADDRESS-SEARCH(T,k)
    return T[k]
DIRECT-ADDRESS-INSERT(T,k)
    T[x.key] = x
DIRECT-ADDRESS-DELETE(T,x)
    T[x.key] = NIL

DIRECT-ADDRESS-SEARCH(T,k)

return T[k]

DIRECT-ADDRESS-INSERT(T,k)

T[x.key] = x

DIRECT-ADDRESS-DELETE(T,x)

T[x.key] = NIL

11.2 Hash tables

With direct addressing, an element with key k is stored in slot k
With hashing, this element is stored in slot h(k)
\(h: U \rightarrow \{0,1,\dots ,m-1\}\)

Collision resolution by chaining

DIRECT-ADDRESS-INSERT(T,x)
    insert x at the head of list T[h(x.key)]
DIRECT-ADDRESS-SEARCH(T,k)
    search for an element with key k in list T[h(k)]
DIRECT-ADDRESS-DELETE(T,x)
    delete x from the list T[h(x.key)]

DIRECT-ADDRESS-INSERT(T,x)

insert x at the head of list T[h(x.key)]

DIRECT-ADDRESS-SEARCH(T,k)

search for an element with key k in list T[h(k)]

DIRECT-ADDRESS-DELETE(T,x)

delete x from the list T[h(x.key)]

Analysis of hashing with chaining
define the load factor \(\alpha\) for T as n/m
call the assumption of simple uniform hashing is that any given element is equally likely to hash into any of the m slots, independently of where any other element hash hashed to.
the expected value of the length of the list in one slot is \(\alpha\)

11.3 Hash functions

What makes a good hash function?
A good hash functon satisfies (approximately) the assumption of simple uniform hashing.
Unfortunately, we typically have no way to check this condition, since we rarely know the probability distribution from which the keys are drawn. Moreover, the keys might not be drawn independently.
Interpreting keys as natural numbers
For example, we can interpret a character string as an integer expressed in suitable radix notation.
In what follows, we assume that the keys are natural numbers.

11.3.1 The division method

h(k) = k mod m
When using the division method, we usually avoid certain values of m.
For example, m should not be a power of 2, since if \(m = 2^p\), then h(k) is just the p lowest-order bits of k. Unless we know that all low-order p-bit patterns are equally likely, we are better off designing the hash function to depend on all bits of the key.
When k is a character string interpret in radix \(2^p\), m should not be \(2^p – 1\) because permuting the character of k does not change its hash value.
A prime not too close to an exact power of 2 is often a good choice for m.

S	M	T	W	T	F	S
« Jun
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

诟屍