You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

288 lines
9.6 KiB

  1. [![Build Status](https://travis-ci.org/greg7mdp/sparsepp.svg?branch=master)](https://travis-ci.org/greg7mdp/sparsepp)
  2. # Sparsepp: A fast, memory efficient hash map for C++
  3. Sparsepp is derived from Google's excellent [sparsehash](https://github.com/sparsehash/sparsehash) implementation. It aims to achieve the following objectives:
  4. - A drop-in alternative for unordered_map and unordered_set.
  5. - **Extremely low memory usage** (typically about one byte overhead per entry).
  6. - **Very efficient**, typically faster than your compiler's unordered map/set or Boost's.
  7. - **C++11 support** (if supported by compiler).
  8. - **Single header** implementation - just copy `sparsepp.h` to your project and include it.
  9. - **Tested** on Windows (vs2010-2015, g++), linux (g++, clang++) and MacOS (clang++).
  10. We believe Sparsepp provides an unparalleled combination of performance and memory usage, and will outperform your compiler's unordered_map on both counts. Only Google's `dense_hash_map` is consistently faster, at the cost of much greater memory usage (especially when the final size of the map is not known in advance).
  11. For a detailed comparison of various hash implementations, including Sparsepp, please see our [write-up](bench.md).
  12. ## Example
  13. ```c++
  14. #include <iostream>
  15. #include <string>
  16. #include <sparsepp.h>
  17. using spp::sparse_hash_map;
  18. int main()
  19. {
  20. // Create an unordered_map of three strings (that map to strings)
  21. sparse_hash_map<std::string, std::string> email =
  22. {
  23. { "tom", "tom@gmail.com"},
  24. { "jeff", "jk@gmail.com"},
  25. { "jim", "jimg@microsoft.com"}
  26. };
  27. // Iterate and print keys and values
  28. for (const auto& n : email)
  29. std::cout << n.first << "'s email is: " << n.second << "\n";
  30. // Add a new entry
  31. email["bill"] = "bg@whatever.com";
  32. // and print it
  33. std::cout << "bill's email is: " << email["bill"] << "\n";
  34. return 0;
  35. }
  36. ```
  37. ## Installation
  38. Since the full Sparsepp implementation is contained in a single header file `sparsepp.h`, the installation consist in copying this header file wherever it will be convenient to include in your project(s).
  39. Optionally, a second header file `spp_utils.h` is provided, which implements only the spp::hash_combine() functionality. This is useful when we want to specify a hash function for a user-defined class in an header file, without including the full `sparsepp.h` header (this is demonstrated in [example 2](#example-2---providing-a-hash-function-for-a-user-defined-class) below).
  40. ## Usage
  41. As shown in the example above, you need to include the header file: `#include <sparsepp.h>`
  42. This provides the implementation for the following classes:
  43. ```c++
  44. namespace spp
  45. {
  46. template <class Key,
  47. class T,
  48. class HashFcn = spp_hash<Key>,
  49. class EqualKey = std::equal_to<Key>,
  50. class Alloc = libc_allocator_with_realloc<std::pair<const Key, T>>>
  51. class sparse_hash_map;
  52. template <class Value,
  53. class HashFcn = spp_hash<Value>,
  54. class EqualKey = std::equal_to<Value>,
  55. class Alloc = libc_allocator_with_realloc<Value>>
  56. class sparse_hash_set;
  57. };
  58. ```
  59. These classes provide the same interface as std::unordered_map and std::unordered_set, with the following differences:
  60. - Calls to erase() may invalidate iterators. However, conformant to the C++11 standard, the position and range erase functions return an iterator pointing to the position immediately following the last of the elements erased. This makes it easy to traverse a sparse hash table and delete elements matching a condition. For example to delete odd values:
  61. ```c++
  62. for (auto it = c.begin(); it != c.end(); )
  63. if (it->first % 2 == 1)
  64. it = c.erase(it);
  65. else
  66. ++it;
  67. ```
  68. - Since items are not grouped into buckets, Bucket APIs have been adapted: `max_bucket_count` is equivalent to `max_size`, and `bucket_count` returns the sparsetable size, which is normally at least twice the number of items inserted into the hash_map.
  69. ## Example 2 - providing a hash function for a user-defined class
  70. In order to use a sparse_hash_set or sparse_hash_map, a hash function should be provided. Even though a the hash function can be provided via the HashFcn template parameter, we recommend injecting a specialization of `std::hash` for the class into the "std" namespace. For example:
  71. ```c++
  72. #include <iostream>
  73. #include <functional>
  74. #include <string>
  75. #include "sparsepp.h"
  76. using std::string;
  77. struct Person
  78. {
  79. bool operator==(const Person &o) const
  80. { return _first == o._first && _last == o._last; }
  81. string _first;
  82. string _last;
  83. };
  84. namespace std
  85. {
  86. // inject specialization of std::hash for Person into namespace std
  87. // ----------------------------------------------------------------
  88. template<>
  89. struct hash<Person>
  90. {
  91. std::size_t operator()(Person const &p) const
  92. {
  93. std::size_t seed = 0;
  94. spp::hash_combine(seed, p._first);
  95. spp::hash_combine(seed, p._last);
  96. return seed;
  97. }
  98. };
  99. }
  100. int main()
  101. {
  102. // As we have defined a specialization of std::hash() for Person,
  103. // we can now create sparse_hash_set or sparse_hash_map of Persons
  104. // ----------------------------------------------------------------
  105. spp::sparse_hash_set<Person> persons = { { "John", "Galt" },
  106. { "Jane", "Doe" } };
  107. for (auto& p: persons)
  108. std::cout << p._first << ' ' << p._last << '\n';
  109. }
  110. ```
  111. The `std::hash` specialization for `Person` combines the hash values for both first and last name using the convenient spp::hash_combine function, and returns the combined hash value.
  112. spp::hash_combine is provided by the header `sparsepp.h`. However, class definitions often appear in header files, and it is desirable to limit the size of headers included in such header files, so we provide the very small header `spp_utils.h` for that purpose:
  113. ```c++
  114. #include <string>
  115. #include "spp_utils.h"
  116. using std::string;
  117. struct Person
  118. {
  119. bool operator==(const Person &o) const
  120. {
  121. return _first == o._first && _last == o._last && _age == o._age;
  122. }
  123. string _first;
  124. string _last;
  125. int _age;
  126. };
  127. namespace std
  128. {
  129. // inject specialization of std::hash for Person into namespace std
  130. // ----------------------------------------------------------------
  131. template<>
  132. struct hash<Person>
  133. {
  134. std::size_t operator()(Person const &p) const
  135. {
  136. std::size_t seed = 0;
  137. spp::hash_combine(seed, p._first);
  138. spp::hash_combine(seed, p._last);
  139. spp::hash_combine(seed, p._age);
  140. return seed;
  141. }
  142. };
  143. }
  144. ```
  145. ## Example 3 - serialization
  146. sparse_hash_set and sparse_hash_map can easily be serialized/unserialized to a file or network connection.
  147. This support is implemented in the following APIs:
  148. ```c++
  149. template <typename Serializer, typename OUTPUT>
  150. bool serialize(Serializer serializer, OUTPUT *stream);
  151. template <typename Serializer, typename INPUT>
  152. bool unserialize(Serializer serializer, INPUT *stream);
  153. ```
  154. The following example demontrates how a simple sparse_hash_map can be written to a file, and then read back. The serializer we use read and writes to a file using the stdio APIs, but it would be equally simple to write a serialized using the stream APIS:
  155. ```c++
  156. #include <cstdio>
  157. #include "sparsepp.h"
  158. using spp::sparse_hash_map;
  159. using namespace std;
  160. class FileSerializer
  161. {
  162. public:
  163. // serialize basic types to FILE
  164. // -----------------------------
  165. template <class T>
  166. bool operator()(FILE *fp, const T& value)
  167. {
  168. return fwrite((const void *)&value, sizeof(value), 1, fp) == 1;
  169. }
  170. template <class T>
  171. bool operator()(FILE *fp, T* value)
  172. {
  173. return fread((void *)value, sizeof(*value), 1, fp) == 1;
  174. }
  175. // serialize std::string to FILE
  176. // -----------------------------
  177. bool operator()(FILE *fp, const string& value)
  178. {
  179. const size_t size = value.size();
  180. return (*this)(fp, size) && fwrite(value.c_str(), size, 1, fp) == 1;
  181. }
  182. bool operator()(FILE *fp, string* value)
  183. {
  184. size_t size;
  185. if (!(*this)(fp, &size))
  186. return false;
  187. char* buf = new char[size];
  188. if (fread(buf, size, 1, fp) != 1)
  189. {
  190. delete [] buf;
  191. return false;
  192. }
  193. new (value) string(buf, (size_t)size);
  194. delete[] buf;
  195. return true;
  196. }
  197. // serialize std::pair<const A, B> to FILE - needed for maps
  198. // ---------------------------------------------------------
  199. template <class A, class B>
  200. bool operator()(FILE *fp, const std::pair<const A, B>& value)
  201. {
  202. return (*this)(fp, value.first) && (*this)(fp, value.second);
  203. }
  204. template <class A, class B>
  205. bool operator()(FILE *fp, std::pair<const A, B> *value)
  206. {
  207. return (*this)(fp, (A *)&value->first) && (*this)(fp, &value->second);
  208. }
  209. };
  210. int main(int argc, char* argv[])
  211. {
  212. sparse_hash_map<string, int> age{ { "John", 12 }, {"Jane", 13 }, { "Fred", 8 } };
  213. // serialize age hash_map to "ages.dmp" file
  214. FILE *out = fopen("ages.dmp", "wb");
  215. age.serialize(FileSerializer(), out);
  216. fclose(out);
  217. sparse_hash_map<string, int> age_read;
  218. // read from "ages.dmp" file into age_read hash_map
  219. FILE *input = fopen("ages.dmp", "rb");
  220. age_read.unserialize(FileSerializer(), input);
  221. fclose(input);
  222. // print out contents of age_read to verify correct serialization
  223. for (auto& v : age_read)
  224. printf("age_read: %s -> %d\n", v.first.c_str(), v.second);
  225. }
  226. ```