You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

299 lines
10 KiB

  1. .. _numpy:
  2. NumPy
  3. #####
  4. Buffer protocol
  5. ===============
  6. Python supports an extremely general and convenient approach for exchanging
  7. data between plugin libraries. Types can expose a buffer view [#f2]_, which
  8. provides fast direct access to the raw internal data representation. Suppose we
  9. want to bind the following simplistic Matrix class:
  10. .. code-block:: cpp
  11. class Matrix {
  12. public:
  13. Matrix(size_t rows, size_t cols) : m_rows(rows), m_cols(cols) {
  14. m_data = new float[rows*cols];
  15. }
  16. float *data() { return m_data; }
  17. size_t rows() const { return m_rows; }
  18. size_t cols() const { return m_cols; }
  19. private:
  20. size_t m_rows, m_cols;
  21. float *m_data;
  22. };
  23. The following binding code exposes the ``Matrix`` contents as a buffer object,
  24. making it possible to cast Matrices into NumPy arrays. It is even possible to
  25. completely avoid copy operations with Python expressions like
  26. ``np.array(matrix_instance, copy = False)``.
  27. .. code-block:: cpp
  28. py::class_<Matrix>(m, "Matrix")
  29. .def_buffer([](Matrix &m) -> py::buffer_info {
  30. return py::buffer_info(
  31. m.data(), /* Pointer to buffer */
  32. sizeof(float), /* Size of one scalar */
  33. py::format_descriptor<float>::format(), /* Python struct-style format descriptor */
  34. 2, /* Number of dimensions */
  35. { m.rows(), m.cols() }, /* Buffer dimensions */
  36. { sizeof(float) * m.rows(), /* Strides (in bytes) for each index */
  37. sizeof(float) }
  38. );
  39. });
  40. The snippet above binds a lambda function, which can create ``py::buffer_info``
  41. description records on demand describing a given matrix. The contents of
  42. ``py::buffer_info`` mirror the Python buffer protocol specification.
  43. .. code-block:: cpp
  44. struct buffer_info {
  45. void *ptr;
  46. size_t itemsize;
  47. std::string format;
  48. int ndim;
  49. std::vector<size_t> shape;
  50. std::vector<size_t> strides;
  51. };
  52. To create a C++ function that can take a Python buffer object as an argument,
  53. simply use the type ``py::buffer`` as one of its arguments. Buffers can exist
  54. in a great variety of configurations, hence some safety checks are usually
  55. necessary in the function body. Below, you can see an basic example on how to
  56. define a custom constructor for the Eigen double precision matrix
  57. (``Eigen::MatrixXd``) type, which supports initialization from compatible
  58. buffer objects (e.g. a NumPy matrix).
  59. .. code-block:: cpp
  60. /* Bind MatrixXd (or some other Eigen type) to Python */
  61. typedef Eigen::MatrixXd Matrix;
  62. typedef Matrix::Scalar Scalar;
  63. constexpr bool rowMajor = Matrix::Flags & Eigen::RowMajorBit;
  64. py::class_<Matrix>(m, "Matrix")
  65. .def("__init__", [](Matrix &m, py::buffer b) {
  66. typedef Eigen::Stride<Eigen::Dynamic, Eigen::Dynamic> Strides;
  67. /* Request a buffer descriptor from Python */
  68. py::buffer_info info = b.request();
  69. /* Some sanity checks ... */
  70. if (info.format != py::format_descriptor<Scalar>::format())
  71. throw std::runtime_error("Incompatible format: expected a double array!");
  72. if (info.ndim != 2)
  73. throw std::runtime_error("Incompatible buffer dimension!");
  74. auto strides = Strides(
  75. info.strides[rowMajor ? 0 : 1] / sizeof(Scalar),
  76. info.strides[rowMajor ? 1 : 0] / sizeof(Scalar));
  77. auto map = Eigen::Map<Matrix, 0, Strides>(
  78. static_cat<Scalar *>(info.ptr), info.shape[0], info.shape[1], strides);
  79. new (&m) Matrix(map);
  80. });
  81. For reference, the ``def_buffer()`` call for this Eigen data type should look
  82. as follows:
  83. .. code-block:: cpp
  84. .def_buffer([](Matrix &m) -> py::buffer_info {
  85. return py::buffer_info(
  86. m.data(), /* Pointer to buffer */
  87. sizeof(Scalar), /* Size of one scalar */
  88. /* Python struct-style format descriptor */
  89. py::format_descriptor<Scalar>::format(),
  90. /* Number of dimensions */
  91. 2,
  92. /* Buffer dimensions */
  93. { (size_t) m.rows(),
  94. (size_t) m.cols() },
  95. /* Strides (in bytes) for each index */
  96. { sizeof(Scalar) * (rowMajor ? m.cols() : 1),
  97. sizeof(Scalar) * (rowMajor ? 1 : m.rows()) }
  98. );
  99. })
  100. For a much easier approach of binding Eigen types (although with some
  101. limitations), refer to the section on :doc:`/advanced/cast/eigen`.
  102. .. seealso::
  103. The file :file:`tests/test_buffers.cpp` contains a complete example
  104. that demonstrates using the buffer protocol with pybind11 in more detail.
  105. .. [#f2] http://docs.python.org/3/c-api/buffer.html
  106. Arrays
  107. ======
  108. By exchanging ``py::buffer`` with ``py::array`` in the above snippet, we can
  109. restrict the function so that it only accepts NumPy arrays (rather than any
  110. type of Python object satisfying the buffer protocol).
  111. In many situations, we want to define a function which only accepts a NumPy
  112. array of a certain data type. This is possible via the ``py::array_t<T>``
  113. template. For instance, the following function requires the argument to be a
  114. NumPy array containing double precision values.
  115. .. code-block:: cpp
  116. void f(py::array_t<double> array);
  117. When it is invoked with a different type (e.g. an integer or a list of
  118. integers), the binding code will attempt to cast the input into a NumPy array
  119. of the requested type. Note that this feature requires the
  120. :file:``pybind11/numpy.h`` header to be included.
  121. Data in NumPy arrays is not guaranteed to packed in a dense manner;
  122. furthermore, entries can be separated by arbitrary column and row strides.
  123. Sometimes, it can be useful to require a function to only accept dense arrays
  124. using either the C (row-major) or Fortran (column-major) ordering. This can be
  125. accomplished via a second template argument with values ``py::array::c_style``
  126. or ``py::array::f_style``.
  127. .. code-block:: cpp
  128. void f(py::array_t<double, py::array::c_style | py::array::forcecast> array);
  129. The ``py::array::forcecast`` argument is the default value of the second
  130. template parameter, and it ensures that non-conforming arguments are converted
  131. into an array satisfying the specified requirements instead of trying the next
  132. function overload.
  133. Structured types
  134. ================
  135. In order for ``py::array_t`` to work with structured (record) types, we first need
  136. to register the memory layout of the type. This can be done via ``PYBIND11_NUMPY_DTYPE``
  137. macro which expects the type followed by field names:
  138. .. code-block:: cpp
  139. struct A {
  140. int x;
  141. double y;
  142. };
  143. struct B {
  144. int z;
  145. A a;
  146. };
  147. PYBIND11_NUMPY_DTYPE(A, x, y);
  148. PYBIND11_NUMPY_DTYPE(B, z, a);
  149. /* now both A and B can be used as template arguments to py::array_t */
  150. Vectorizing functions
  151. =====================
  152. Suppose we want to bind a function with the following signature to Python so
  153. that it can process arbitrary NumPy array arguments (vectors, matrices, general
  154. N-D arrays) in addition to its normal arguments:
  155. .. code-block:: cpp
  156. double my_func(int x, float y, double z);
  157. After including the ``pybind11/numpy.h`` header, this is extremely simple:
  158. .. code-block:: cpp
  159. m.def("vectorized_func", py::vectorize(my_func));
  160. Invoking the function like below causes 4 calls to be made to ``my_func`` with
  161. each of the array elements. The significant advantage of this compared to
  162. solutions like ``numpy.vectorize()`` is that the loop over the elements runs
  163. entirely on the C++ side and can be crunched down into a tight, optimized loop
  164. by the compiler. The result is returned as a NumPy array of type
  165. ``numpy.dtype.float64``.
  166. .. code-block:: pycon
  167. >>> x = np.array([[1, 3],[5, 7]])
  168. >>> y = np.array([[2, 4],[6, 8]])
  169. >>> z = 3
  170. >>> result = vectorized_func(x, y, z)
  171. The scalar argument ``z`` is transparently replicated 4 times. The input
  172. arrays ``x`` and ``y`` are automatically converted into the right types (they
  173. are of type ``numpy.dtype.int64`` but need to be ``numpy.dtype.int32`` and
  174. ``numpy.dtype.float32``, respectively)
  175. Sometimes we might want to explicitly exclude an argument from the vectorization
  176. because it makes little sense to wrap it in a NumPy array. For instance,
  177. suppose the function signature was
  178. .. code-block:: cpp
  179. double my_func(int x, float y, my_custom_type *z);
  180. This can be done with a stateful Lambda closure:
  181. .. code-block:: cpp
  182. // Vectorize a lambda function with a capture object (e.g. to exclude some arguments from the vectorization)
  183. m.def("vectorized_func",
  184. [](py::array_t<int> x, py::array_t<float> y, my_custom_type *z) {
  185. auto stateful_closure = [z](int x, float y) { return my_func(x, y, z); };
  186. return py::vectorize(stateful_closure)(x, y);
  187. }
  188. );
  189. In cases where the computation is too complicated to be reduced to
  190. ``vectorize``, it will be necessary to create and access the buffer contents
  191. manually. The following snippet contains a complete example that shows how this
  192. works (the code is somewhat contrived, since it could have been done more
  193. simply using ``vectorize``).
  194. .. code-block:: cpp
  195. #include <pybind11/pybind11.h>
  196. #include <pybind11/numpy.h>
  197. namespace py = pybind11;
  198. py::array_t<double> add_arrays(py::array_t<double> input1, py::array_t<double> input2) {
  199. auto buf1 = input1.request(), buf2 = input2.request();
  200. if (buf1.ndim != 1 || buf2.ndim != 1)
  201. throw std::runtime_error("Number of dimensions must be one");
  202. if (buf1.size != buf2.size)
  203. throw std::runtime_error("Input shapes must match");
  204. /* No pointer is passed, so NumPy will allocate the buffer */
  205. auto result = py::array_t<double>(buf1.size);
  206. auto buf3 = result.request();
  207. double *ptr1 = (double *) buf1.ptr,
  208. *ptr2 = (double *) buf2.ptr,
  209. *ptr3 = (double *) buf3.ptr;
  210. for (size_t idx = 0; idx < buf1.shape[0]; idx++)
  211. ptr3[idx] = ptr1[idx] + ptr2[idx];
  212. return result;
  213. }
  214. PYBIND11_PLUGIN(test) {
  215. py::module m("test");
  216. m.def("add_arrays", &add_arrays, "Add two NumPy arrays");
  217. return m.ptr();
  218. }
  219. .. seealso::
  220. The file :file:`tests/test_numpy_vectorize.cpp` contains a complete
  221. example that demonstrates using :func:`vectorize` in more detail.