    
    
    
Hash Policies
    This subsection describes hash policies. It is organized as follows:
    
 The #general_termsGeneral Terms  Section describes
            some general terms.
    
    
 The #range_hashing_fnsRange-Hashing Functions  Section
        describes range-hasing functions.
    
 The #hash_policies_ranged_hash_policiesRanged-Hash Functions  Section
        describes ranged-hash functions. 
    
 The #pb_assoc_impImplementation in pb_assoc  Section
            describes the implementation of these concepts in 
pb_assoc.
    
General Terms 
    There
are actually three functions involved in transforming a key into a hash-table's
position (see Figure
#hash_ranged_hash_range_hashing_fnsHash runctions, ranged-hash functions, and range-hashing functions
):
    
        A 
ranged-hash function, which maps keys into an interval of the
        non-negative integrals. This is the function actually required by the
        hash-table algorithm.
    
    
        A hash function, which maps keys into non-negative integral types. This is
        typically specified by the writer of the key class.
    
    
        A 
range-hashing function, which maps non-negative integral types into an
        interval of non-negative integral types.
    
no image Hash runctions, ranged-hash functions, and range-hashing functions.
    Let 
U be a domain (e.g., the integers, or the strings of 3
    characters). A hash-table algorithm needs to map elements of 
U "uniformly"
    into the range 
[0,..., m - 1] (where m is a non-negative integral
    value, and is, in general, time varying). 
I.e., the algorithm needs a ranged-hash    function
    
f : U × Z+ → Z+ ,
    such that for any 
u in U,
    
0 ≤ f(u, m) ≤ m - 1 ,
    and which has "good uniformity" properties [
references.html#knuth98sortingknuth98sorting ].
    One common solution is to use the composition of the hash function
    
h : U → Z+ ,
    which maps elements of 
U into the non-negative integrals, and
    
g : Z+ × Z+ → Z+, 
    which maps a non-negative hash value, and a non-negative range upper-bound into
    a non-negative integral in the range between 0 (inclusive) and the range upper
    bound (exclusive), 
i.e., for any r in Z+,
    
0 ≤ g(r, m) ≤ m - 1 .
    The resulting ranged-hash function, is
    
f(u , m) = g(h(u), m)     
(1) .
    From the above, it is obvious that given 
g and h, f can
    always be composed (however the converse is not true).
    The above describes the case where a key is to be mapped into a 
single
position
 within a hash table, e.g., in a collision-chaining table.
In other cases, a key is to be mapped into a 
sequence of poisitionswithin a table, 
e.g., in a probing table.
    Similar terms apply in this case: the table requires a 
ranged probefunction, mapping a key into a sequence of positions withing the table. This is
typically acheived by composing a 
hash function mapping the key
into a non-negative integral type, a 
probe function transforming the
hash value into a sequence of hash values, and a 
range-hashing function
transforming the sequence of hash values into a sequence of positions.
Range-Hashing Functions 
    Some common choices for range-hashing functions are the division,
    multiplication, and middle-square methods [
references.html#knuth98sortingknuth98sorting ],
    defined as
    
g(r, m) = r mod m (2) ,
    
g(r, m) = ⌈ u/v ( a r mod v ) ⌉ ,
    and
    
g(r, m) = ⌈ u/v ( r2 mod v ) ⌉ ,
respectively, for some positive integrals 
u and v (typically
powers of 2), and some 
a. Each of these range-hashing functions works
best for some different setting.
    The division method 
#division_method(2)  is a very common
    choice. However, even this single method can be implemented in two very
    different ways. It is possible to implement 
#division_method(2)     using the low level 
% (modulo) operation (for any m), or the low
    level 
& (bit-mask) operation (for the case where m is a power of
    2), 
i.e.,
    
g(r, m) = r % m (3) ,
    and
    
    
g(r, m) = r & m - 1, ( m = 2k    
        for some
 k) (4) ,
    respectively.
    The 
% (modulo) implementation #division_method_prime_mod(3)     has the advantage that for 
m a prime far from a power of 2, g(r, m)    is affected by all the bits of 
r (minimizing the chance of collision).
    It has the disadvantage of using the costly modulo operation. This method is
    hard-wired into SGI's implementation [
references.html#sgi_stlsgi_stl ].
    The 
& (bit-mask) implementation #division_method_bit_mask(4)     has the advantage of relying on the fast bitwise and operation. It has the
    disadvantage that for 
g(r, m) is affected only by the low order bits of r.
    This method is hard-wired into Dinkumware's implementation [
references.html#dinkumware_stldinkumware_stl ].
Ranged-Hash Functions 
    Although rarer, there are cases where it is beneficial to allow the client to
directly specify a ranged-hash hash function. It is true, that the writer of
the ranged-hash function cannot rely on the values of 
m having specific
numerical properties suitable for hashing (in the sense used in [
references.html#knuth98sortingknuth98sorting ]),
since the values of 
m are determined by a resize policy with possibly
orthogonal considerations [
references.html#austern98segmentedaustern98segmented ].
The values of 
m can be used in some cases, though, to estimate the
"general" number of distinct values required.
    Let
    
s = [ s0,..., st - 1] 
    be a string of 
t characters, each of which is from domain S.
Consider the following ranged-hash function:
    
        
            f
1(s, m) =
            ∑ 
i =
            0
t   - 1 si ai mod m     
 (5) ,
    where 
a is some non-negative integral value. This is the standard
string-hashing function used in SGI's implementation (with 
a = 5) [ references.html#sgi_stlsgi_stl ].
Its advantage is that it takes into account all of the characters of the
string.
    Now assume that 
s is the string representation of a of a long DNA
sequence (and so 
S = {'A', 'C', 'G', 'T'}). In this case, scanning the
entire string might be prohibitively expensive. A possible alternative might be
to use only the first 
k characters of the string, where
    k 
|S| ≥ m ,
    
i.e., using the hash function
    
f2(s, m) = ∑ i = 0k
                - 1
 si ai mod m , (6)
    requiring scanning over only
    
k = log4( m ) 
    characters.
    Other more elaborate hash-functions might scan 
k characters starting at
    a random position (determined at each resize), or scanning 
k random
    positions (determined at each resize), 
i.e., using
    
f3(s, m) = ∑ i = r0r0 + k - 1        s
i ai mod m ,
    or
    
f4(s, m) = ∑ i = 0k - 1 sri        a
ri mod m ,
    respectively, for 
r0,..., rk-1 each in the
    (inclusive) range 
[0,...,t-1].
Implementation in pb_assoc 
    Containers based on collision-chaining hash tables in 
pb_assocare parameterized by the functors 
Hash_Fn, and Comb_Hash_Fn.
    If such a container is instantiated with any hash functor and
range-hashing functor, the container will synthesize a ranged-hash functor
automatically. For example, Figure
#hash_range_hashing_seq_diagramInsert hash sequence diagram
shows an 
insert sequence diagram. The user inserts an element (point A),
the container transforms the key into a non-negative integral using the hash
functor (points B and C), and transforms the result into a position
using the combining functor (points D and E).
no image 
Insert hash sequence diagram.
    If such a container is instantiated with the
concepts.html#concepts_null_policiesnull policy hash functor,
null_hash_fn.htmlnull_hash_fn ,
and a combining-hash functor, the container treats
the combining hash functor as a ranged-hash function. For example, Figure
#hash_range_hashing_seq_diagram2Insert hash sequence diagram with a null combination policy
shows an 
insert sequence diagram. The user inserts an element (point A),
the container transforms the key into a position
using the combining functor (points B and C).
no image 
Insert hash sequence diagram with a null combination policy.
    
pb_assoc contains the following hash-related policies:
    
direct_mask_range_hashing.htmldirect_mask_range_hashing and
direct_mod_range_hashing.htmldirect_mod_range_hashing are range-hashing functions based on a bit-mask and a modulo operation, respectively.
    
    
linear_probe_fn.htmllinear_probe_fn  and
quadratic_probe_fn.htmlquadratic_probe_fn  are probe
classes based on linear and quadratic increment, respectively.
    
    
null_hash_fn.htmlnull_hash_fn and
null_probe_fn.htmlnull_probe_fn are
concepts.html#concepts_null_policiesnull policy classes  for creating
ranged-hash and ranged-probe functions directly (
i.e., not through
composition).
    
    
pb_assoc does not provide any hash functions (it relies on those
of the STL).
