# Hash Table

**Pintos provides a hash table data structure in `lib/kernel/hash.c`.**

* To use it you will need to include its header file, `lib/kernel/hash.h`, with `#include <hash.h>`.
* **No code provided with Pintos uses the hash table**, which means that you are free to use it as is, modify its implementation for your own purposes, or ignore it, as you wish.
* **Most implementations of the virtual memory project use a hash table to translate pages to frames.** You may find other uses for hash tables as well.

## Data Types

### Basic Element

A hash table is represented by **`struct hash`**.

* <mark style="color:blue;">**Type: struct hash**</mark>
  * **Represents an entire hash table.**
  * The actual members of `struct hash` are "**opaque**." That is, code that uses a hash table should not access `struct hash` members directly, nor should it need to. Instead, use hash table functions and macros.
  * **The hash table operates on elements of type `struct hash_elem`.**
* <mark style="color:blue;">**Type: struct hash\_elem**</mark>
  * **Embed a `struct hash_elem` member in the structure you want to include in a hash table.**
  * Like `struct hash`, `struct hash_elem` is **opaque**. All functions for operating on hash table elements actually take and return pointers to `struct hash_elem`, not pointers to your hash table's real element type.

**You will often need to obtain a `struct hash_elem` given a real element of the hash table, and vice versa.** Given a real element of the hash table, you may use the `&` operator to obtain a pointer to its `struct hash_elem`. Use the **`hash_entry()`** macro to go the other direction.

* <mark style="color:blue;">**Macro: type \*hash\_entry (struct hash\_elem \*elem, type, member)**</mark>
  * **Returns a pointer to the structure that \_elem**\_**, a pointer to a `struct hash_elem`, is embedded within.**
  * You must provide *type*, the name of the structure that elem is inside, and *member*, the name of the *member* in *type* that *elem* points to.
  * For example, suppose `h` is a `struct hash_elem *` variable that points to a `struct thread` member (of type `struct hash_elem`) named `h_elem`. Then, `hash_entry(h, struct thread, h_elem)` yields the address of the `struct thread` that `h` points within.

See section [Hash Table Example](#hash-table-example), for an example.

### Hash Functions

**Each hash table element must contain a key, that is, data that identifies and distinguishes elements, which must be&#x20;*****unique*** **among elements in the hash table.** (Elements may also contain non-key data that need not be unique.)

* **While an element is in a hash table, its key data must not be changed.** Instead, if need be, remove the element from the hash table, modify its key, then reinsert the element.
* For each hash table, you ***must*** write two functions that act on keys: **a hash function** and **a comparison function**.

These functions must match the following prototypes:

* <mark style="color:blue;">**Type: unsigned hash\_hash\_func (const struct hash\_elem \*element, void \*aux)**</mark>
  * **Returns a hash of&#x20;*****element*****'s data, as a value anywhere in the range of `unsigned int`.** The hash of an *element* should be a pseudo-random function of the *element*'s key. It must not depend on non-key data in the element or on any non-constant data other than the key. Pintos provides the some functions (See Hash Functions Basis) as a suitable basis for hash functions.
  * See section [Auxiliary Data](#auxiliary-data), for an explanation of *aux*.

<details>

<summary>Hash Functions Basis</summary>

* <mark style="color:blue;">**Function: unsigned hash\_bytes (const void \*buf, size\_t \*size)**</mark>
  * **Returns a hash of the&#x20;*****size*****&#x20;bytes starting at&#x20;*****buf*****.**
  * The implementation is the general-purpose [Fowler-Noll-Vo hash](http://en.wikipedia.org/wiki/Fowler_Noll_Vo_hash) for 32-bit words.
* <mark style="color:blue;">**Function: unsigned hash\_string (const char \*s**</mark>**)**
  * **Returns a hash of null-terminated string&#x20;*****s*****.**
* <mark style="color:blue;">**Function: unsigned hash\_int (int i)**</mark>
  * **Returns a hash of integer&#x20;*****i*****.**

**Notes**

* If your key is **a single piece of data** of an appropriate type, it is sensible for your hash function to directly return the output of one of these functions.
* For **multiple pieces of data**, you may wish to **combine the output** of more than one call to them using, e.g., the **^** (exclusive or) operator.
* Finally, you may entirely ignore these functions and **write your own hash function** from scratch, but remember that your goal is to build an operating system kernel, not to design a hash function.

</details>

* <mark style="color:blue;">**Type: bool hash\_less\_func (const struct hash\_elem \*a, const struct hash\_elem \*b, void \*aux)**</mark>
  * **Compares the keys stored in elements&#x20;*****a*****&#x20;and&#x20;*****b*****.**
  * Returns true if *a* is less than *b*, false if *a* is greater than or equal to *b*.
  * **If two elements compare equal, then they must hash to equal values.**
  * See section [Auxiliary Data](#auxiliary-data), for an explanation of aux.
  * See section [Hash Table Example](#hash-table-example), for hash and comparison function examples.

### Hash Action Function

A few functions in this hash table implementation accepts a pointer to a third kind of function (called hash action function) as an argument:

* <mark style="color:blue;">**Type: void hash\_action\_func (struct hash\_elem \*element, void \*aux)**</mark>
  * **Performs some kind of action, chosen by the caller, on&#x20;*****element*****.**
  * See section [Auxiliary Data](#auxiliary-data), for an explanation of *aux*.

## Basic Functions

These functions ***create***, ***destroy***, and ***inspect*** hash tables.

* <mark style="color:blue;">**Function: bool hash\_init (struct hash \*hash, hash\_hash\_func \*hash\_func, hash\_less\_func \*less\_func, void \*aux)**</mark>
  * **Initializes&#x20;*****hash*****&#x20;as a hash table with&#x20;*****hash\_func*****&#x20;as hash function,&#x20;*****less\_func*****&#x20;as comparison function, and&#x20;*****aux*****&#x20;as auxiliary data.**
  * Returns true if successful, false on failure. `hash_init()` calls `malloc()` and fails if memory cannot be allocated.
  * See section [Auxiliary Data](#auxiliary-data), for an explanation of *aux*, which is most often a null pointer.
* <mark style="color:blue;">**Function: void hash\_clear (struct hash \*hash, hash\_action\_func \*action)**</mark>
  * **Removes all the elements from&#x20;*****hash*****, which must have been previously initialized with `hash_init()`.**
  * **If&#x20;*****action*** **is non-null, then it is called once for each element in the hash table**, which gives the caller an opportunity to deallocate any memory or other resources used by the element. For example, if the hash table elements are dynamically allocated using `malloc()`, then action could `free()` the element. This is safe because `hash_clear()` will not access the memory in a given hash element after calling *action* on it. However, **action must not call any function that may modify the hash table**, such as `hash_insert()` or `hash_delete()`.
* <mark style="color:blue;">**Function: void hash\_destroy (struct hash \*hash, hash\_action\_func \*action)**</mark>
  * **If&#x20;*****action*** **is non-null, calls it for each element in the** ***hash***, with the same semantics as a call to `hash_clear()`.
  * **Then, frees the memory held by&#x20;*****hash*****.**
  * Afterward, *hash* must not be passed to any hash table function, absent an intervening call to `hash_init()`.
* <mark style="color:blue;">**Function: size\_t hash\_size (struct hash \*hash)**</mark>
  * **Returns the number of elements currently stored in&#x20;*****hash*****.**
* <mark style="color:blue;">**Function: bool hash\_empty (struct hash \*hash)**</mark>
  * Returns true if *hash* currently contains no elements, false if *hash* contains at least one element.

## Search Functions

Each of these functions **searches a hash table for an element that compares equal to one provided**. Based on the success of the search, they perform some action, such as inserting a new element into the hash table, or simply return the result of the search.

* <mark style="color:blue;">**Function: struct hash\_elem \*hash\_insert (struct hash \*hash, struct hash\_elem \*element)**</mark>
  * Searches *hash* for an element equal to *element*.
  * If none is found, inserts *element* into *hash* and returns a null pointer. If the table already contains an element equal to *element*, it is returned without modifying *hash*.
* <mark style="color:blue;">**Function: struct hash\_elem \*hash\_replace (struct hash \*hash, struct hash\_elem \*element)**</mark>
  * Inserts *element* into *hash*. Any element equal to *element* already in *hash* is removed. Returns the element removed, or a null pointer if *hash* did not contain an element equal to *element*.
  * **The caller is responsible for deallocating any resources associated with the returned element, as appropriate.** For example, if the hash table elements are dynamically allocated using `malloc()`, then the caller must `free()` the element after it is no longer needed.

**The element passed to the following functions is only used for hashing and comparison purposes.** It is never actually inserted into the hash table. Thus, **only key data in the element needs to be initialized**, and other data in the element will not be used. It often makes sense to declare an instance of the element type as a local variable, initialize the key data, and then pass the address of its `struct hash_elem` to `hash_find()` or `hash_delete()`. See section [Hash Table Example](#hash-table-example), for an example. (Large structures should not be allocated as local variables. See section [Struct thread](https://pkuflyingpig.gitbook.io/pintos/appendix/threads#struct-thread), for more information.)

* <mark style="color:blue;">**Function: struct hash\_elem \*hash\_find (struct hash \*hash, struct hash\_elem \*element)**</mark>
  * **Searches&#x20;*****hash*****&#x20;for an element equal to&#x20;*****element*****.**
  * Returns the element found, if any, or a null pointer otherwise.
* <mark style="color:blue;">**Function: struct hash\_elem \*hash\_delete (struct hash \*hash, struct hash\_elem \*element)**</mark>
  * **Searches&#x20;*****hash*****&#x20;for an element equal to&#x20;*****element*****.**
  * If one is found, it is removed from *hash* and returned. Otherwise, a null pointer is returned and *hash* is unchanged.

{% hint style="info" %}
**The caller is responsible for deallocating any resources associated with the returned element, as appropriate.** For example, if the hash table elements are dynamically allocated using `malloc()`, then the caller must `free()` the element after it is no longer needed.
{% endhint %}

## Iteration Functions

**These functions allow iterating through the elements in a hash table.** Two interfaces are supplied. The first requires writing and supplying a ***hash\_action\_func*** to act on each element (see section [Data Types](#data-types)).

* <mark style="color:blue;">**Function: void hash\_apply (struct hash \*hash, hash\_action\_func \*action)**</mark>
  * **Calls&#x20;*****action*****&#x20;once for each element in&#x20;*****hash*****,&#x20;**<mark style="color:red;">**in arbitrary order**</mark>**.**
  * ***action*****&#x20;must not call any function that may modify the hash table**, such as `hash_insert()` or `hash_delete()`.
  * ***action*****&#x20;must not modify key data in elements**, although it may modify any other data.

The second interface is based on an "**iterator**" data type. Idiomatically, iterators are used as follows:

```c
struct hash_iterator i;

hash_first (&i, h);
while (hash_next (&i))
  {
    struct foo *f = hash_entry (hash_cur (&i), struct foo, elem);
    ...do something with f...
  }
```

* <mark style="color:blue;">**Type: struct hash\_iterator**</mark>
  * **Represents a position within a hash table.**
  * Calling any function that may modify a hash table, such as `hash_insert()` or `hash_delete()`, **invalidates** all iterators within that hash table.
  * Like `struct hash` and `struct hash_elem`, `struct hash_iterator` is **opaque**.
* <mark style="color:blue;">**Function: void hash\_first (struct hash\_iterator \*iterator, struct hash \*hash)**</mark>
  * **Initializes&#x20;*****iterator*** **to just before the first element in** **hash.**
* <mark style="color:blue;">**Function: struct hash\_elem \*hash\_next (struct hash\_iterator \*iterator)**</mark>
  * **Advances&#x20;*****iterator*** **to the next element in&#x20;*****hash*****, and returns that element.** Returns a null pointer if no elements remain.
  * After `hash_next()` returns null for *iterator*, calling it again yields undefined behavior.
* <mark style="color:blue;">**Function: struct hash\_elem \*hash\_cur (struct hash\_iterator \*iterator)**</mark>
  * **Returns the value most recently returned by `hash_next()` for&#x20;*****iterator*****.**
  * Yields undefined behavior after `hash_first()` has been called on *iterator* but before `hash_next()` has been called for the first time.

## Hash Table Example

Suppose you have a structure, called `struct page`, that you want to put into a hash table.

* First, define `struct page` to include a `struct hash_elem` member:

```c
struct page
  {
    struct hash_elem hash_elem; /* Hash table element. */
    void *addr;                 /* Virtual address. */
    /* ...other members... */
  };
```

* We **write a hash function and a comparison function** using *addr* as the key. A pointer can be hashed based on its bytes, and the `<` operator works fine for comparing pointers:

```c
/* Returns a hash value for page p. */
unsigned
page_hash (const struct hash_elem *p_, void *aux UNUSED)
{
  const struct page *p = hash_entry (p_, struct page, hash_elem);
  return hash_bytes (&p->addr, sizeof p->addr);
}

/* Returns true if page a precedes page b. */
bool
page_less (const struct hash_elem *a_, const struct hash_elem *b_,
           void *aux UNUSED)
{
  const struct page *a = hash_entry (a_, struct page, hash_elem);
  const struct page *b = hash_entry (b_, struct page, hash_elem);

  return a->addr < b->addr;
}
```

(The use of `UNUSED` in these functions' prototypes suppresses a warning that *aux* is unused. See section [Function and Parameter Attributes](https://pkuflyingpig.gitbook.io/pintos/getting-started/debug-and-test/debugging#function-and-parameter-attributes), for information about `UNUSED`)

* Then, we can **create a hash table** like this:

```c
struct hash pages;

hash_init (&pages, page_hash, page_less, NULL);
```

* Now we can manipulate the hash table we've created. If `p` is a pointer to a `struct page`, we can **insert** it into the hash table with:

```cpp
hash_insert (&pages, &p->hash_elem);
```

If there's a chance that pages might already contain a page with the same *addr*, then we should check `hash_insert()`'s return value.

* To **search** for an element in the hash table, use `hash_find()`. This takes a little setup, because `hash_find()` takes an element to compare against. Here's a function that will find and return a page based on a virtual address, assuming that *pages* is defined at file scope:

```cpp
/* Returns the page containing the given virtual address,
   or a null pointer if no such page exists. */
struct page *
page_lookup (const void *address)
{
  struct page p;
  struct hash_elem *e;

  p.addr = address;
  e = hash_find (&pages, &p.hash_elem);
  return e != NULL ? hash_entry (e, struct page, hash_elem) : NULL;
}
```

`struct page` is allocated as a local variable here on the assumption that it is fairly small. Large structures should not be allocated as local variables. See section [Struct thread](https://pkuflyingpig.gitbook.io/pintos/appendix/threads#struct-thread), for more information.

* A similar function could **delete** a *page* by *address* using `hash_delete()`.

## Auxiliary Data

In simple cases like the example above, there's no need for the *aux* parameters. In these cases, just **pass a null pointer** to `hash_init()` for *aux* and ignore the values passed to the hash function and comparison functions. (You'll get a compiler warning if you don't use the *aux* parameter, but you can turn that off with the `UNUSED` macro, as shown in the example, or you can just ignore it.)

***aux*** **is useful when you have some property of the data in the hash table is both constant and needed for hashing or comparison, but not stored in the data items themselves.** For example, if the items in a hash table are fixed-length strings, but the items themselves don't indicate what that fixed length is, you could pass the length as an *aux* parameter.

## Synchronization

**The hash table does not do any internal synchronization.** **It is the caller's responsibility to synchronize calls to hash table functions.**

* In general, any number of functions that examine but do not modify the hash table, such as `hash_find()` or `hash_next()`, may execute simultaneously.
* However, these functions cannot safely execute at the same time as any function that may modify a given hash table, such as `hash_insert()` or `hash_delete()`, nor may more than one function that can modify a given hash table execute safely at once.

**It is also the caller's responsibility to synchronize access to data in hash table elements.** How to synchronize access to this data depends on how it is designed and organized, as with any other data structure.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://pkuflyingpig.gitbook.io/pintos/appendix/reference-guide/hash-table.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
