Hash Table
Pintos provides a hash table data structure in lib/kernel/hash.c
.
To use it you will need to include its header file,
lib/kernel/hash.h
, with#include <hash.h>
.No code provided with Pintos uses the hash table, which means that you are free to use it as is, modify its implementation for your own purposes, or ignore it, as you wish.
Most implementations of the virtual memory project use a hash table to translate pages to frames. You may find other uses for hash tables as well.
Data Types
Basic Element
A hash table is represented by struct hash
.
Type: struct hash
Represents an entire hash table.
The actual members of
struct hash
are "opaque." That is, code that uses a hash table should not accessstruct hash
members directly, nor should it need to. Instead, use hash table functions and macros.The hash table operates on elements of type
struct hash_elem
.
Type: struct hash_elem
Embed a
struct hash_elem
member in the structure you want to include in a hash table.Like
struct hash
,struct hash_elem
is opaque. All functions for operating on hash table elements actually take and return pointers tostruct hash_elem
, not pointers to your hash table's real element type.
You will often need to obtain a struct hash_elem
given a real element of the hash table, and vice versa. Given a real element of the hash table, you may use the &
operator to obtain a pointer to its struct hash_elem
. Use the hash_entry()
macro to go the other direction.
Macro: type *hash_entry (struct hash_elem *elem, type, member)
Returns a pointer to the structure that _elem_, a pointer to a
struct hash_elem
, is embedded within.You must provide type, the name of the structure that elem is inside, and member, the name of the member in type that elem points to.
For example, suppose
h
is astruct hash_elem *
variable that points to astruct thread
member (of typestruct hash_elem
) namedh_elem
. Then,hash_entry(h, struct thread, h_elem)
yields the address of thestruct thread
thath
points within.
See section Hash Table Example, for an example.
Hash Functions
Each hash table element must contain a key, that is, data that identifies and distinguishes elements, which must be unique among elements in the hash table. (Elements may also contain non-key data that need not be unique.)
While an element is in a hash table, its key data must not be changed. Instead, if need be, remove the element from the hash table, modify its key, then reinsert the element.
For each hash table, you must write two functions that act on keys: a hash function and a comparison function.
These functions must match the following prototypes:
Type: unsigned hash_hash_func (const struct hash_elem *element, void *aux)
Returns a hash of element's data, as a value anywhere in the range of
unsigned int
. The hash of an element should be a pseudo-random function of the element's key. It must not depend on non-key data in the element or on any non-constant data other than the key. Pintos provides the some functions (See Hash Functions Basis) as a suitable basis for hash functions.See section Auxiliary Data, for an explanation of aux.
Type: bool hash_less_func (const struct hash_elem *a, const struct hash_elem *b, void *aux)
Compares the keys stored in elements a and b.
Returns true if a is less than b, false if a is greater than or equal to b.
If two elements compare equal, then they must hash to equal values.
See section Auxiliary Data, for an explanation of aux.
See section Hash Table Example, for hash and comparison function examples.
Hash Action Function
A few functions in this hash table implementation accepts a pointer to a third kind of function (called hash action function) as an argument:
Type: void hash_action_func (struct hash_elem *element, void *aux)
Performs some kind of action, chosen by the caller, on element.
See section Auxiliary Data, for an explanation of aux.
Basic Functions
These functions create, destroy, and inspect hash tables.
Function: bool hash_init (struct hash *hash, hash_hash_func *hash_func, hash_less_func *less_func, void *aux)
Initializes hash as a hash table with hash_func as hash function, less_func as comparison function, and aux as auxiliary data.
Returns true if successful, false on failure.
hash_init()
callsmalloc()
and fails if memory cannot be allocated.See section Auxiliary Data, for an explanation of aux, which is most often a null pointer.
Function: void hash_clear (struct hash *hash, hash_action_func *action)
Removes all the elements from _hash_, which must have been previously initialized with
hash_init()
.If action is non-null, then it is called once for each element in the hash table, which gives the caller an opportunity to deallocate any memory or other resources used by the element. For example, if the hash table elements are dynamically allocated using
malloc()
, then action couldfree()
the element. This is safe becausehash_clear()
will not access the memory in a given hash element after calling action on it. However, action must not call any function that may modify the hash table, such ashash_insert()
orhash_delete()
.
Function: void hash_destroy (struct hash *hash, hash_action_func *action)
If action is non-null, calls it for each element in the hash, with the same semantics as a call to
hash_clear()
.Then, frees the memory held by hash.
Afterward, hash must not be passed to any hash table function, absent an intervening call to
hash_init()
.
Function: size_t hash_size (struct hash *hash)
Returns the number of elements currently stored in hash.
Function: bool hash_empty (struct hash *hash)
Returns true if hash currently contains no elements, false if hash contains at least one element.
Search Functions
Each of these functions searches a hash table for an element that compares equal to one provided. Based on the success of the search, they perform some action, such as inserting a new element into the hash table, or simply return the result of the search.
Function: struct hash_elem *hash_insert (struct hash *hash, struct hash_elem *element)
Searches hash for an element equal to element.
If none is found, inserts element into hash and returns a null pointer. If the table already contains an element equal to element, it is returned without modifying hash.
Function: struct hash_elem *hash_replace (struct hash *hash, struct hash_elem *element)
Inserts element into hash. Any element equal to element already in hash is removed. Returns the element removed, or a null pointer if hash did not contain an element equal to element.
The caller is responsible for deallocating any resources associated with the returned element, as appropriate. For example, if the hash table elements are dynamically allocated using
malloc()
, then the caller mustfree()
the element after it is no longer needed.
The element passed to the following functions is only used for hashing and comparison purposes. It is never actually inserted into the hash table. Thus, only key data in the element needs to be initialized, and other data in the element will not be used. It often makes sense to declare an instance of the element type as a local variable, initialize the key data, and then pass the address of its struct hash_elem
to hash_find()
or hash_delete()
. See section Hash Table Example, for an example. (Large structures should not be allocated as local variables. See section Struct thread, for more information.)
Function: struct hash_elem *hash_find (struct hash *hash, struct hash_elem *element)
Searches hash for an element equal to element.
Returns the element found, if any, or a null pointer otherwise.
Function: struct hash_elem *hash_delete (struct hash *hash, struct hash_elem *element)
Searches hash for an element equal to element.
If one is found, it is removed from hash and returned. Otherwise, a null pointer is returned and hash is unchanged.
The caller is responsible for deallocating any resources associated with the returned element, as appropriate. For example, if the hash table elements are dynamically allocated using malloc()
, then the caller must free()
the element after it is no longer needed.
Iteration Functions
These functions allow iterating through the elements in a hash table. Two interfaces are supplied. The first requires writing and supplying a hash_action_func to act on each element (see section Data Types).
Function: void hash_apply (struct hash *hash, hash_action_func *action)
Calls action once for each element in hash, in arbitrary order.
action must not call any function that may modify the hash table, such as
hash_insert()
orhash_delete()
.action must not modify key data in elements, although it may modify any other data.
The second interface is based on an "iterator" data type. Idiomatically, iterators are used as follows:
Type: struct hash_iterator
Represents a position within a hash table.
Calling any function that may modify a hash table, such as
hash_insert()
orhash_delete()
, invalidates all iterators within that hash table.Like
struct hash
andstruct hash_elem
,struct hash_iterator
is opaque.
Function: void hash_first (struct hash_iterator *iterator, struct hash *hash)
Initializes iterator to just before the first element in hash.
Function: struct hash_elem *hash_next (struct hash_iterator *iterator)
Advances iterator to the next element in hash, and returns that element. Returns a null pointer if no elements remain.
After
hash_next()
returns null for iterator, calling it again yields undefined behavior.
Function: struct hash_elem *hash_cur (struct hash_iterator *iterator)
Returns the value most recently returned by
hash_next()
for iterator.Yields undefined behavior after
hash_first()
has been called on iterator but beforehash_next()
has been called for the first time.
Hash Table Example
Suppose you have a structure, called struct page
, that you want to put into a hash table.
First, define
struct page
to include astruct hash_elem
member:
We write a hash function and a comparison function using addr as the key. A pointer can be hashed based on its bytes, and the
<
operator works fine for comparing pointers:
(The use of UNUSED
in these functions' prototypes suppresses a warning that aux is unused. See section Function and Parameter Attributes, for information about UNUSED
)
Then, we can create a hash table like this:
Now we can manipulate the hash table we've created. If
p
is a pointer to astruct page
, we can insert it into the hash table with:
If there's a chance that pages might already contain a page with the same addr, then we should check hash_insert()
's return value.
To search for an element in the hash table, use
hash_find()
. This takes a little setup, becausehash_find()
takes an element to compare against. Here's a function that will find and return a page based on a virtual address, assuming that pages is defined at file scope:
struct page
is allocated as a local variable here on the assumption that it is fairly small. Large structures should not be allocated as local variables. See section Struct thread, for more information.
A similar function could delete a page by address using
hash_delete()
.
Auxiliary Data
In simple cases like the example above, there's no need for the aux parameters. In these cases, just pass a null pointer to hash_init()
for aux and ignore the values passed to the hash function and comparison functions. (You'll get a compiler warning if you don't use the aux parameter, but you can turn that off with the UNUSED
macro, as shown in the example, or you can just ignore it.)
aux is useful when you have some property of the data in the hash table is both constant and needed for hashing or comparison, but not stored in the data items themselves.** For example, if the items in a hash table are fixed-length strings, but the items themselves don't indicate what that fixed length is, you could pass the length as an aux parameter.
Synchronization
The hash table does not do any internal synchronization. It is the caller's responsibility to synchronize calls to hash table functions.
In general, any number of functions that examine but do not modify the hash table, such as
hash_find()
orhash_next()
, may execute simultaneously.However, these functions cannot safely execute at the same time as any function that may modify a given hash table, such as
hash_insert()
orhash_delete()
, nor may more than one function that can modify a given hash table execute safely at once.
It is also the caller's responsibility to synchronize access to data in hash table elements. How to synchronize access to this data depends on how it is designed and organized, as with any other data structure.
Last updated