9.6 KiB
Sparsepp: A fast, memory efficient hash map for C++
Sparsepp is derived from Google's excellent sparsehash implementation. It aims to achieve the following objectives:
- A drop-in alternative for unordered_map and unordered_set.
- Extremely low memory usage (typically about one byte overhead per entry).
- Very efficient, typically faster than your compiler's unordered map/set or Boost's.
- C++11 support (if supported by compiler).
- Single header implementation - just copy
sparsepp.h
to your project and include it. - Tested on Windows (vs2010-2015, g++), linux (g++, clang++) and MacOS (clang++).
We believe Sparsepp provides an unparalleled combination of performance and memory usage, and will outperform your compiler's unordered_map on both counts. Only Google's dense_hash_map
is consistently faster, at the cost of much greater memory usage (especially when the final size of the map is not known in advance).
For a detailed comparison of various hash implementations, including Sparsepp, please see our write-up.
Example
#include <iostream>
#include <string>
#include <sparsepp.h>
using spp::sparse_hash_map;
int main()
{
// Create an unordered_map of three strings (that map to strings)
sparse_hash_map<std::string, std::string> email =
{
{ "tom", "tom@gmail.com"},
{ "jeff", "jk@gmail.com"},
{ "jim", "jimg@microsoft.com"}
};
// Iterate and print keys and values
for (const auto& n : email)
std::cout << n.first << "'s email is: " << n.second << "\n";
// Add a new entry
email["bill"] = "bg@whatever.com";
// and print it
std::cout << "bill's email is: " << email["bill"] << "\n";
return 0;
}
Installation
Since the full Sparsepp implementation is contained in a single header file sparsepp.h
, the installation consist in copying this header file wherever it will be convenient to include in your project(s).
Optionally, a second header file spp_utils.h
is provided, which implements only the spp::hash_combine() functionality. This is useful when we want to specify a hash function for a user-defined class in an header file, without including the full sparsepp.h
header (this is demonstrated in example 2 below).
Usage
As shown in the example above, you need to include the header file: #include <sparsepp.h>
This provides the implementation for the following classes:
namespace spp
{
template <class Key,
class T,
class HashFcn = spp_hash<Key>,
class EqualKey = std::equal_to<Key>,
class Alloc = libc_allocator_with_realloc<std::pair<const Key, T>>>
class sparse_hash_map;
template <class Value,
class HashFcn = spp_hash<Value>,
class EqualKey = std::equal_to<Value>,
class Alloc = libc_allocator_with_realloc<Value>>
class sparse_hash_set;
};
These classes provide the same interface as std::unordered_map and std::unordered_set, with the following differences:
- Calls to erase() may invalidate iterators. However, conformant to the C++11 standard, the position and range erase functions return an iterator pointing to the position immediately following the last of the elements erased. This makes it easy to traverse a sparse hash table and delete elements matching a condition. For example to delete odd values:
for (auto it = c.begin(); it != c.end(); )
if (it->first % 2 == 1)
it = c.erase(it);
else
++it;
- Since items are not grouped into buckets, Bucket APIs have been adapted:
max_bucket_count
is equivalent tomax_size
, andbucket_count
returns the sparsetable size, which is normally at least twice the number of items inserted into the hash_map.
Example 2 - providing a hash function for a user-defined class
In order to use a sparse_hash_set or sparse_hash_map, a hash function should be provided. Even though a the hash function can be provided via the HashFcn template parameter, we recommend injecting a specialization of std::hash
for the class into the "std" namespace. For example:
#include <iostream>
#include <functional>
#include <string>
#include "sparsepp.h"
using std::string;
struct Person
{
bool operator==(const Person &o) const
{ return _first == o._first && _last == o._last; }
string _first;
string _last;
};
namespace std
{
// inject specialization of std::hash for Person into namespace std
// ----------------------------------------------------------------
template<>
struct hash<Person>
{
std::size_t operator()(Person const &p) const
{
std::size_t seed = 0;
spp::hash_combine(seed, p._first);
spp::hash_combine(seed, p._last);
return seed;
}
};
}
int main()
{
// As we have defined a specialization of std::hash() for Person,
// we can now create sparse_hash_set or sparse_hash_map of Persons
// ----------------------------------------------------------------
spp::sparse_hash_set<Person> persons = { { "John", "Galt" },
{ "Jane", "Doe" } };
for (auto& p: persons)
std::cout << p._first << ' ' << p._last << '\n';
}
The std::hash
specialization for Person
combines the hash values for both first and last name using the convenient spp::hash_combine function, and returns the combined hash value.
spp::hash_combine is provided by the header sparsepp.h
. However, class definitions often appear in header files, and it is desirable to limit the size of headers included in such header files, so we provide the very small header spp_utils.h
for that purpose:
#include <string>
#include "spp_utils.h"
using std::string;
struct Person
{
bool operator==(const Person &o) const
{
return _first == o._first && _last == o._last && _age == o._age;
}
string _first;
string _last;
int _age;
};
namespace std
{
// inject specialization of std::hash for Person into namespace std
// ----------------------------------------------------------------
template<>
struct hash<Person>
{
std::size_t operator()(Person const &p) const
{
std::size_t seed = 0;
spp::hash_combine(seed, p._first);
spp::hash_combine(seed, p._last);
spp::hash_combine(seed, p._age);
return seed;
}
};
}
Example 3 - serialization
sparse_hash_set and sparse_hash_map can easily be serialized/unserialized to a file or network connection. This support is implemented in the following APIs:
template <typename Serializer, typename OUTPUT>
bool serialize(Serializer serializer, OUTPUT *stream);
template <typename Serializer, typename INPUT>
bool unserialize(Serializer serializer, INPUT *stream);
The following example demontrates how a simple sparse_hash_map can be written to a file, and then read back. The serializer we use read and writes to a file using the stdio APIs, but it would be equally simple to write a serialized using the stream APIS:
#include <cstdio>
#include "sparsepp.h"
using spp::sparse_hash_map;
using namespace std;
class FileSerializer
{
public:
// serialize basic types to FILE
// -----------------------------
template <class T>
bool operator()(FILE *fp, const T& value)
{
return fwrite((const void *)&value, sizeof(value), 1, fp) == 1;
}
template <class T>
bool operator()(FILE *fp, T* value)
{
return fread((void *)value, sizeof(*value), 1, fp) == 1;
}
// serialize std::string to FILE
// -----------------------------
bool operator()(FILE *fp, const string& value)
{
const size_t size = value.size();
return (*this)(fp, size) && fwrite(value.c_str(), size, 1, fp) == 1;
}
bool operator()(FILE *fp, string* value)
{
size_t size;
if (!(*this)(fp, &size))
return false;
char* buf = new char[size];
if (fread(buf, size, 1, fp) != 1)
{
delete [] buf;
return false;
}
new (value) string(buf, (size_t)size);
delete[] buf;
return true;
}
// serialize std::pair<const A, B> to FILE - needed for maps
// ---------------------------------------------------------
template <class A, class B>
bool operator()(FILE *fp, const std::pair<const A, B>& value)
{
return (*this)(fp, value.first) && (*this)(fp, value.second);
}
template <class A, class B>
bool operator()(FILE *fp, std::pair<const A, B> *value)
{
return (*this)(fp, (A *)&value->first) && (*this)(fp, &value->second);
}
};
int main(int argc, char* argv[])
{
sparse_hash_map<string, int> age{ { "John", 12 }, {"Jane", 13 }, { "Fred", 8 } };
// serialize age hash_map to "ages.dmp" file
FILE *out = fopen("ages.dmp", "wb");
age.serialize(FileSerializer(), out);
fclose(out);
sparse_hash_map<string, int> age_read;
// read from "ages.dmp" file into age_read hash_map
FILE *input = fopen("ages.dmp", "rb");
age_read.unserialize(FileSerializer(), input);
fclose(input);
// print out contents of age_read to verify correct serialization
for (auto& v : age_read)
printf("age_read: %s -> %d\n", v.first.c_str(), v.second);
}