[![Build Status](https://travis-ci.org/greg7mdp/sparsepp.svg?branch=master)](https://travis-ci.org/greg7mdp/sparsepp) # Sparsepp: A fast, memory efficient hash map for C++ Sparsepp is derived from Google's excellent [sparsehash](https://github.com/sparsehash/sparsehash) implementation. It aims to achieve the following objectives: - A drop-in alternative for unordered_map and unordered_set. - **Extremely low memory usage** (typically about one byte overhead per entry). - **Very efficient**, typically faster than your compiler's unordered map/set or Boost's. - **C++11 support** (if supported by compiler). - **Single header** implementation - just copy `sparsepp.h` to your project and include it. - **Tested** on Windows (vs2010-2015, g++), linux (g++, clang++) and MacOS (clang++). We believe Sparsepp provides an unparalleled combination of performance and memory usage, and will outperform your compiler's unordered_map on both counts. Only Google's `dense_hash_map` is consistently faster, at the cost of much greater memory usage (especially when the final size of the map is not known in advance). For a detailed comparison of various hash implementations, including Sparsepp, please see our [write-up](bench.md). ## Example ```c++ #include #include #include using spp::sparse_hash_map; int main() { // Create an unordered_map of three strings (that map to strings) sparse_hash_map email = { { "tom", "tom@gmail.com"}, { "jeff", "jk@gmail.com"}, { "jim", "jimg@microsoft.com"} }; // Iterate and print keys and values for (const auto& n : email) std::cout << n.first << "'s email is: " << n.second << "\n"; // Add a new entry email["bill"] = "bg@whatever.com"; // and print it std::cout << "bill's email is: " << email["bill"] << "\n"; return 0; } ``` ## Installation Since the full Sparsepp implementation is contained in a single header file `sparsepp.h`, the installation consist in copying this header file wherever it will be convenient to include in your project(s). Optionally, a second header file `spp_utils.h` is provided, which implements only the spp::hash_combine() functionality. This is useful when we want to specify a hash function for a user-defined class in an header file, without including the full `sparsepp.h` header (this is demonstrated in [example 2](#example-2---providing-a-hash-function-for-a-user-defined-class) below). ## Usage As shown in the example above, you need to include the header file: `#include ` This provides the implementation for the following classes: ```c++ namespace spp { template , class EqualKey = std::equal_to, class Alloc = libc_allocator_with_realloc>> class sparse_hash_map; template , class EqualKey = std::equal_to, class Alloc = libc_allocator_with_realloc> class sparse_hash_set; }; ``` These classes provide the same interface as std::unordered_map and std::unordered_set, with the following differences: - Calls to erase() may invalidate iterators. However, conformant to the C++11 standard, the position and range erase functions return an iterator pointing to the position immediately following the last of the elements erased. This makes it easy to traverse a sparse hash table and delete elements matching a condition. For example to delete odd values: ```c++ for (auto it = c.begin(); it != c.end(); ) if (it->first % 2 == 1) it = c.erase(it); else ++it; ``` - Since items are not grouped into buckets, Bucket APIs have been adapted: `max_bucket_count` is equivalent to `max_size`, and `bucket_count` returns the sparsetable size, which is normally at least twice the number of items inserted into the hash_map. ## Example 2 - providing a hash function for a user-defined class In order to use a sparse_hash_set or sparse_hash_map, a hash function should be provided. Even though a the hash function can be provided via the HashFcn template parameter, we recommend injecting a specialization of `std::hash` for the class into the "std" namespace. For example: ```c++ #include #include #include #include "sparsepp.h" using std::string; struct Person { bool operator==(const Person &o) const { return _first == o._first && _last == o._last; } string _first; string _last; }; namespace std { // inject specialization of std::hash for Person into namespace std // ---------------------------------------------------------------- template<> struct hash { std::size_t operator()(Person const &p) const { std::size_t seed = 0; spp::hash_combine(seed, p._first); spp::hash_combine(seed, p._last); return seed; } }; } int main() { // As we have defined a specialization of std::hash() for Person, // we can now create sparse_hash_set or sparse_hash_map of Persons // ---------------------------------------------------------------- spp::sparse_hash_set persons = { { "John", "Galt" }, { "Jane", "Doe" } }; for (auto& p: persons) std::cout << p._first << ' ' << p._last << '\n'; } ``` The `std::hash` specialization for `Person` combines the hash values for both first and last name using the convenient spp::hash_combine function, and returns the combined hash value. spp::hash_combine is provided by the header `sparsepp.h`. However, class definitions often appear in header files, and it is desirable to limit the size of headers included in such header files, so we provide the very small header `spp_utils.h` for that purpose: ```c++ #include #include "spp_utils.h" using std::string; struct Person { bool operator==(const Person &o) const { return _first == o._first && _last == o._last && _age == o._age; } string _first; string _last; int _age; }; namespace std { // inject specialization of std::hash for Person into namespace std // ---------------------------------------------------------------- template<> struct hash { std::size_t operator()(Person const &p) const { std::size_t seed = 0; spp::hash_combine(seed, p._first); spp::hash_combine(seed, p._last); spp::hash_combine(seed, p._age); return seed; } }; } ``` ## Example 3 - serialization sparse_hash_set and sparse_hash_map can easily be serialized/unserialized to a file or network connection. This support is implemented in the following APIs: ```c++ template bool serialize(Serializer serializer, OUTPUT *stream); template bool unserialize(Serializer serializer, INPUT *stream); ``` The following example demontrates how a simple sparse_hash_map can be written to a file, and then read back. The serializer we use read and writes to a file using the stdio APIs, but it would be equally simple to write a serialized using the stream APIS: ```c++ #include #include "sparsepp.h" using spp::sparse_hash_map; using namespace std; class FileSerializer { public: // serialize basic types to FILE // ----------------------------- template bool operator()(FILE *fp, const T& value) { return fwrite((const void *)&value, sizeof(value), 1, fp) == 1; } template bool operator()(FILE *fp, T* value) { return fread((void *)value, sizeof(*value), 1, fp) == 1; } // serialize std::string to FILE // ----------------------------- bool operator()(FILE *fp, const string& value) { const size_t size = value.size(); return (*this)(fp, size) && fwrite(value.c_str(), size, 1, fp) == 1; } bool operator()(FILE *fp, string* value) { size_t size; if (!(*this)(fp, &size)) return false; char* buf = new char[size]; if (fread(buf, size, 1, fp) != 1) { delete [] buf; return false; } new (value) string(buf, (size_t)size); delete[] buf; return true; } // serialize std::pair to FILE - needed for maps // --------------------------------------------------------- template bool operator()(FILE *fp, const std::pair& value) { return (*this)(fp, value.first) && (*this)(fp, value.second); } template bool operator()(FILE *fp, std::pair *value) { return (*this)(fp, (A *)&value->first) && (*this)(fp, &value->second); } }; int main(int argc, char* argv[]) { sparse_hash_map age{ { "John", 12 }, {"Jane", 13 }, { "Fred", 8 } }; // serialize age hash_map to "ages.dmp" file FILE *out = fopen("ages.dmp", "wb"); age.serialize(FileSerializer(), out); fclose(out); sparse_hash_map age_read; // read from "ages.dmp" file into age_read hash_map FILE *input = fopen("ages.dmp", "rb"); age_read.unserialize(FileSerializer(), input); fclose(input); // print out contents of age_read to verify correct serialization for (auto& v : age_read) printf("age_read: %s -> %d\n", v.first.c_str(), v.second); } ```