[![Build Status](https://travis-ci.org/greg7mdp/sparsepp.svg?branch=master)](https://travis-ci.org/greg7mdp/sparsepp)

# Sparsepp: A fast, memory efficient hash map for C++

Sparsepp is derived from Google's excellent [sparsehash](https://github.com/sparsehash/sparsehash) implementation. It aims to achieve the following objectives:

- A drop-in alternative for unordered_map and unordered_set.
- **Extremely low memory usage** (typically about one byte overhead per entry).
- **Very efficient**, typically faster than your compiler's unordered map/set or Boost's.
- **C++11 support** (if supported by compiler).
- **Single header** implementation - just copy `sparsepp.h` to your project and include it.
- **Tested** on Windows (vs2010-2015, g++), linux (g++, clang++) and MacOS (clang++).

We believe Sparsepp provides an unparalleled combination of performance and memory usage, and will outperform your compiler's unordered_map on both counts. Only Google's `dense_hash_map` is consistently faster, at the cost of much greater memory usage (especially when the final size of the map is not known in advance). 

For a detailed comparison of various hash implementations, including Sparsepp, please see our [write-up](bench.md).

## Example

```c++
#include <iostream>
#include <string>
#include <sparsepp.h>

using spp::sparse_hash_map;
 
int main()
{
    // Create an unordered_map of three strings (that map to strings)
    sparse_hash_map<std::string, std::string> email = 
    {
        { "tom",  "tom@gmail.com"},
        { "jeff", "jk@gmail.com"},
        { "jim",  "jimg@microsoft.com"}
    };
 
    // Iterate and print keys and values 
    for (const auto& n : email) 
        std::cout << n.first << "'s email is: " << n.second << "\n";
 
    // Add a new entry
    email["bill"] = "bg@whatever.com";
 
    // and print it
    std::cout << "bill's email is: " << email["bill"] << "\n";
 
    return 0;
}
```

## Installation

Since the full Sparsepp implementation is contained in a single header file `sparsepp.h`, the installation consist in copying this header file wherever it will be convenient to include in your project(s). 

Optionally, a second header file `spp_utils.h` is provided, which implements only the spp::hash_combine() functionality. This is useful when we want to specify a hash function for a user-defined class in an header file, without including the full `sparsepp.h` header (this is demonstrated in [example 2](#example-2---providing-a-hash-function-for-a-user-defined-class) below).

## Usage

As shown in the example above, you need to include the header file: `#include <sparsepp.h>`

This provides the implementation for the following classes:

```c++
namespace spp
{
    template <class Key, 
              class T,
              class HashFcn  = spp_hash<Key>,  
              class EqualKey = std::equal_to<Key>,
              class Alloc    = libc_allocator_with_realloc<std::pair<const Key, T>>>
    class sparse_hash_map;

    template <class Value,
              class HashFcn  = spp_hash<Value>,
              class EqualKey = std::equal_to<Value>,
              class Alloc    = libc_allocator_with_realloc<Value>>
    class sparse_hash_set;
};
```

These classes provide the same interface as std::unordered_map and std::unordered_set, with the following differences:

- Calls to erase() may invalidate iterators. However, conformant to the C++11 standard, the position and range erase functions return an iterator pointing to the position immediately following the last of the elements erased. This makes it easy to traverse a sparse hash table and delete elements matching a condition. For example to delete odd values:

```c++
        for (auto it = c.begin(); it != c.end(); )
        if (it->first % 2 == 1)
            it = c.erase(it);
        else
            ++it;
```

- Since items are not grouped into buckets, Bucket APIs have been adapted: `max_bucket_count` is equivalent to `max_size`, and `bucket_count` returns the sparsetable size, which is normally at least twice the number of items inserted into the hash_map.

## Example 2 - providing a hash function for a user-defined class

In order to use a sparse_hash_set or sparse_hash_map, a hash function should be provided. Even though a the hash function can be provided via the HashFcn template parameter, we recommend injecting a specialization of `std::hash` for the class into the "std" namespace. For example:

```c++
#include <iostream>
#include <functional>
#include <string>
#include "sparsepp.h"

using std::string;

struct Person 
{
    bool operator==(const Person &o) const 
    { return _first == o._first && _last == o._last; }

    string _first;
    string _last;
};

namespace std
{
    // inject specialization of std::hash for Person into namespace std
    // ----------------------------------------------------------------
    template<> 
    struct hash<Person>
    {
        std::size_t operator()(Person const &p) const
        {
            std::size_t seed = 0;
            spp::hash_combine(seed, p._first);
            spp::hash_combine(seed, p._last);
            return seed;
        }
    };
}
 
int main()
{
    // As we have defined a specialization of std::hash() for Person, 
    // we can now create sparse_hash_set or sparse_hash_map of Persons
    // ----------------------------------------------------------------
    spp::sparse_hash_set<Person> persons = { { "John", "Galt" }, 
                                             { "Jane", "Doe" } };
    for (auto& p: persons)
        std::cout << p._first << ' ' << p._last << '\n';
}
```

The `std::hash` specialization for `Person` combines the hash values for both first and last name using the convenient spp::hash_combine function, and returns the combined hash value. 

spp::hash_combine is provided by the header `sparsepp.h`. However, class definitions often appear in header files, and it is desirable to limit the size of headers included in such header files, so we provide the very small header `spp_utils.h` for that purpose:

```c++
#include <string>
#include "spp_utils.h"

using std::string;
 
struct Person 
{
    bool operator==(const Person &o) const 
    { 
        return _first == o._first && _last == o._last && _age == o._age; 
    }

    string _first;
    string _last;
    int    _age;
};

namespace std
{
    // inject specialization of std::hash for Person into namespace std
    // ----------------------------------------------------------------
    template<> 
    struct hash<Person>
    {
        std::size_t operator()(Person const &p) const
        {
            std::size_t seed = 0;
            spp::hash_combine(seed, p._first);
            spp::hash_combine(seed, p._last);
            spp::hash_combine(seed, p._age);
            return seed;
        }
    };
}
```

## Example 3 - serialization

sparse_hash_set and sparse_hash_map can easily be serialized/unserialized to a file or network connection.
This support is implemented in the following APIs:

```c++
    template <typename Serializer, typename OUTPUT>
    bool serialize(Serializer serializer, OUTPUT *stream);

    template <typename Serializer, typename INPUT>
    bool unserialize(Serializer serializer, INPUT *stream);
```

The following example demontrates how a simple sparse_hash_map can be written to a file, and then read back. The serializer we use read and writes to a file using the stdio APIs, but it would be equally simple to write a serialized using the stream APIS:

```c++
#include <cstdio>

#include "sparsepp.h"

using spp::sparse_hash_map;
using namespace std;

class FileSerializer 
{
public:
    // serialize basic types to FILE
    // -----------------------------
    template <class T>
    bool operator()(FILE *fp, const T& value) 
    {
        return fwrite((const void *)&value, sizeof(value), 1, fp) == 1;
    }

    template <class T>
    bool operator()(FILE *fp, T* value) 
    {
        return fread((void *)value, sizeof(*value), 1, fp) == 1;
    }

    // serialize std::string to FILE
    // -----------------------------
    bool operator()(FILE *fp, const string& value) 
    {
        const size_t size = value.size();
        return (*this)(fp, size) && fwrite(value.c_str(), size, 1, fp) == 1;
    }

    bool operator()(FILE *fp, string* value) 
    {
        size_t size;
        if (!(*this)(fp, &size)) 
            return false;
        char* buf = new char[size];
        if (fread(buf, size, 1, fp) != 1) 
        {
            delete [] buf;
            return false;
        }
        new (value) string(buf, (size_t)size);
        delete[] buf;
        return true;
    }

    // serialize std::pair<const A, B> to FILE - needed for maps
    // ---------------------------------------------------------
    template <class A, class B>
    bool operator()(FILE *fp, const std::pair<const A, B>& value)
    {
        return (*this)(fp, value.first) && (*this)(fp, value.second);
    }

    template <class A, class B>
    bool operator()(FILE *fp, std::pair<const A, B> *value) 
    {
        return (*this)(fp, (A *)&value->first) && (*this)(fp, &value->second);
    }
};

int main(int argc, char* argv[]) 
{
    sparse_hash_map<string, int> age{ { "John", 12 }, {"Jane", 13 }, { "Fred", 8 } };

    // serialize age hash_map to "ages.dmp" file
    FILE *out = fopen("ages.dmp", "wb");
    age.serialize(FileSerializer(), out);
    fclose(out);

    sparse_hash_map<string, int> age_read;

    // read from "ages.dmp" file into age_read hash_map 
    FILE *input = fopen("ages.dmp", "rb");
    age_read.unserialize(FileSerializer(), input);
    fclose(input);

    // print out contents of age_read to verify correct serialization
    for (auto& v : age_read)
        printf("age_read: %s -> %d\n", v.first.c_str(), v.second);
}
```