tempestpy/doc/source/doc/models/building_mdps.ipynb


								{

								 "cells": [

								  {

								   "cell_type": "markdown",

								   "metadata": {},

								   "source": [

								    "# Markov decision processes (MDPs)"

								   ]

								  },

								  {

								   "cell_type": "markdown",

								   "metadata": {},

								   "source": [

								    "## Background\n",

								    "\n",

								    "In [Discrete-time Markov chains (DTMCs)](building_dtmcs.ipynb) we modelled Knuth-Yao’s model of a fair die by the means of a DTMC.\n",

								    "In the following we extend this model with nondeterministic choice by building a Markov decision process.\n",

								    "\n",

								    "[01-building-mdps.py](https://github.com/moves-rwth/stormpy/blob/master/examples/building_mdps/01-building-mdps.py)\n",

								    "\n",

								    "First, we import Stormpy:"

								   ]

								  },

								  {

								   "cell_type": "code",

								   "execution_count": null,

								   "metadata": {

								    "hide-output": false

								   },

								   "outputs": [],

								   "source": [

								    ">>> import stormpy"

								   ]

								  },

								  {

								   "cell_type": "markdown",

								   "metadata": {},

								   "source": [

								    "## Transition Matrix\n",

								    "\n",

								    "Since we want to build a nondeterminstic model, we create a transition matrix with a custom row group for each state:"

								   ]

								  },

								  {

								   "cell_type": "code",

								   "execution_count": null,

								   "metadata": {

								    "hide-output": false

								   },

								   "outputs": [],

								   "source": [

								    ">>> builder = stormpy.SparseMatrixBuilder(rows=0, columns=0, entries=0, force_dimensions=False, has_custom_row_grouping=True, row_groups=0)"

								   ]

								  },

								  {

								   "cell_type": "markdown",

								   "metadata": {},

								   "source": [

								    "We need more than one row for the transitions starting in state 0 because a nondeterministic choice over the actions is available.\n",

								    "Therefore, we start a new group that will contain the rows representing actions of state 0.\n",

								    "Note that the row group needs to be added before any entries are added to the group:"

								   ]

								  },

								  {

								   "cell_type": "code",

								   "execution_count": null,

								   "metadata": {

								    "hide-output": false

								   },

								   "outputs": [],

								   "source": [

								    ">>> builder.new_row_group(0)\n",

								    ">>> builder.add_next_value(0, 1, 0.5)\n",

								    ">>> builder.add_next_value(0, 2, 0.5)\n",

								    ">>> builder.add_next_value(1, 1, 0.2)\n",

								    ">>> builder.add_next_value(1, 2, 0.8)"

								   ]

								  },

								  {

								   "cell_type": "markdown",

								   "metadata": {},

								   "source": [

								    "In this example, we have two nondeterministic choices in state 0.\n",

								    "With choice 0 we have probability 0.5 to got to state 1 and probability 0.5 to got to state 2.\n",

								    "With choice 1 we got to state 1 with probability 0.2 and go to state 2 with probability 0.8.\n",

								    "\n",

								    "For the remaining states, we need to specify the starting rows of each row group:"

								   ]

								  },

								  {

								   "cell_type": "code",

								   "execution_count": null,

								   "metadata": {

								    "hide-output": false

								   },

								   "outputs": [],

								   "source": [

								    ">>> builder.new_row_group(2)\n",

								    ">>> builder.add_next_value(2, 3, 0.5)\n",

								    ">>> builder.add_next_value(2, 4, 0.5)\n",

								    ">>> builder.new_row_group(3)\n",

								    ">>> builder.add_next_value(3, 5, 0.5)\n",

								    ">>> builder.add_next_value(3, 6, 0.5)\n",

								    ">>> builder.new_row_group(4)\n",

								    ">>> builder.add_next_value(4, 7, 0.5)\n",

								    ">>> builder.add_next_value(4, 1, 0.5)\n",

								    ">>> builder.new_row_group(5)\n",

								    ">>> builder.add_next_value(5, 8, 0.5)\n",

								    ">>> builder.add_next_value(5, 9, 0.5)\n",

								    ">>> builder.new_row_group(6)\n",

								    ">>> builder.add_next_value(6, 10, 0.5)\n",

								    ">>> builder.add_next_value(6, 11, 0.5)\n",

								    ">>> builder.new_row_group(7)\n",

								    ">>> builder.add_next_value(7, 2, 0.5)\n",

								    ">>> builder.add_next_value(7, 12, 0.5)\n",

								    "\n",

								    ">>> for s in range(8, 14):\n",

								    "...    builder.new_row_group(s)\n",

								    "...    builder.add_next_value(s, s - 1, 1)"

								   ]

								  },

								  {

								   "cell_type": "markdown",

								   "metadata": {},

								   "source": [

								    "Finally, we build the transition matrix:"

								   ]

								  },

								  {

								   "cell_type": "code",

								   "execution_count": null,

								   "metadata": {

								    "hide-output": false

								   },

								   "outputs": [],

								   "source": [

								    ">>> transition_matrix = builder.build()"

								   ]

								  },

								  {

								   "cell_type": "markdown",

								   "metadata": {},

								   "source": [

								    "## Labeling\n",

								    "\n",

								    "We have seen the construction of a state labeling in previous examples. Therefore we omit the description here\n",

								    "Instead, we focus on the choices.\n",

								    "Since in state 0 a nondeterministic choice over two actions is available, the number of choices is 14.\n",

								    "To distinguish those we can define a choice labeling:"

								   ]

								  },

								  {

								   "cell_type": "code",

								   "execution_count": null,

								   "metadata": {

								    "nbsphinx": "hidden"

								   },

								   "outputs": [],

								   "source": [

								    ">>> state_labeling = stormpy.storage.StateLabeling(13)\n",

								    ">>> labels = {'init', 'one', 'two', 'three', 'four', 'five', 'six', 'done', 'deadlock'}\n",

								    ">>> for label in labels:\n",

								    "...     state_labeling.add_label(label)\n",

								    "\n",

								    ">>> state_labeling.add_label_to_state('init', 0)\n",

								    ">>> state_labeling.add_label_to_state('one', 7)\n",

								    ">>> state_labeling.add_label_to_state('two', 8)\n",

								    ">>> state_labeling.add_label_to_state('three', 9)\n",

								    ">>> state_labeling.add_label_to_state('four', 10)\n",

								    ">>> state_labeling.add_label_to_state('five', 11)\n",

								    ">>> state_labeling.add_label_to_state('six', 12)\n",

								    ">>> state_labeling.set_states('done', stormpy.BitVector(13, [7, 8, 9, 10, 11, 12]))"

								   ]

								  },

								  {

								   "cell_type": "code",

								   "execution_count": null,

								   "metadata": {

								    "hide-output": false

								   },

								   "outputs": [],

								   "source": [

								    ">>> choice_labeling = stormpy.storage.ChoiceLabeling(14)\n",

								    ">>> choice_labels = {'a', 'b'}\n",

								    "\n",

								    ">>> for label in choice_labels:\n",

								    "...    choice_labeling.add_label(label)"

								   ]

								  },

								  {

								   "cell_type": "markdown",

								   "metadata": {},

								   "source": [

								    "We assign the label ‘a’ to the first action of state 0 and ‘b’ to the second.\n",

								    "Recall that those actions where defined in row one and two of the transition matrix respectively:"

								   ]

								  },

								  {

								   "cell_type": "code",

								   "execution_count": null,

								   "metadata": {

								    "hide-output": false

								   },

								   "outputs": [],

								   "source": [

								    ">>> choice_labeling.add_label_to_choice('a', 0)\n",

								    ">>> choice_labeling.add_label_to_choice('b', 1)\n",

								    ">>> print(choice_labeling) "

								   ]

								  },

								  {

								   "cell_type": "markdown",

								   "metadata": {},

								   "source": [

								    "## Reward models\n",

								    "\n",

								    "In this reward model the length of the action rewards coincides with the number of choices:"

								   ]

								  },

								  {

								   "cell_type": "code",

								   "execution_count": null,

								   "metadata": {

								    "hide-output": false

								   },

								   "outputs": [],

								   "source": [

								    ">>> reward_models = {}\n",

								    ">>> action_reward = [0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]\n",

								    ">>> reward_models['coin_flips'] = stormpy.SparseRewardModel(optional_state_action_reward_vector=action_reward)"

								   ]

								  },

								  {

								   "cell_type": "markdown",

								   "metadata": {},

								   "source": [

								    "## Building the Model\n",

								    "\n",

								    "We collect the components:"

								   ]

								  },

								  {

								   "cell_type": "code",

								   "execution_count": null,

								   "metadata": {

								    "hide-output": false

								   },

								   "outputs": [],

								   "source": [

								    ">>> components = stormpy.SparseModelComponents(transition_matrix=transition_matrix, state_labeling=state_labeling, reward_models=reward_models, rate_transitions=False)\n",

								    ">>> components.choice_labeling = choice_labeling"

								   ]

								  },

								  {

								   "cell_type": "markdown",

								   "metadata": {},

								   "source": [

								    "We build the model:"

								   ]

								  },

								  {

								   "cell_type": "code",

								   "execution_count": null,

								   "metadata": {

								    "hide-output": false

								   },

								   "outputs": [],

								   "source": [

								    ">>> mdp = stormpy.storage.SparseMdp(components)\n",

								    ">>> print(mdp) "

								   ]

								  },

								  {

								   "cell_type": "markdown",

								   "metadata": {},

								   "source": [

								    "## Partially observable Markov decision process (POMDPs)\n",

								    "\n",

								    "To build a partially observable Markov decision process (POMDP),\n",

								    "components.observations can be set to a list of numbers that defines the status of the observables in each state."

								   ]

								  }

								 ],

								 "metadata": {

								  "celltoolbar": "Edit Metadata",

								  "date": 1598178167.234528,

								  "filename": "building_mdps.rst",

								  "kernelspec": {

								   "display_name": "Python 3",

								   "language": "python",

								   "name": "python3"

								  },

								  "language_info": {

								   "codemirror_mode": {

								    "name": "ipython",

								    "version": 3

								   },

								   "file_extension": ".py",

								   "mimetype": "text/x-python",

								   "name": "python",

								   "nbconvert_exporter": "python",

								   "pygments_lexer": "ipython3",

								   "version": "3.8.2"

								  },

								  "title": "Markov decision processes (MDPs)"

								 },

								 "nbformat": 4,

								 "nbformat_minor": 4

								}