You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 

309 lines
8.4 KiB

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Markov decision processes (MDPs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Background\n",
"\n",
"In [Discrete-time Markov chains (DTMCs)](building_dtmcs.ipynb) we modelled Knuth-Yao’s model of a fair die by the means of a DTMC.\n",
"In the following we extend this model with nondeterministic choice by building a Markov decision process.\n",
"\n",
"[01-building-mdps.py](https://github.com/moves-rwth/stormpy/blob/master/examples/building_mdps/01-building-mdps.py)\n",
"\n",
"First, we import Stormpy:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
">>> import stormpy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Transition Matrix\n",
"\n",
"Since we want to build a nondeterminstic model, we create a transition matrix with a custom row group for each state:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
">>> builder = stormpy.SparseMatrixBuilder(rows=0, columns=0, entries=0, force_dimensions=False, has_custom_row_grouping=True, row_groups=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We need more than one row for the transitions starting in state 0 because a nondeterministic choice over the actions is available.\n",
"Therefore, we start a new group that will contain the rows representing actions of state 0.\n",
"Note that the row group needs to be added before any entries are added to the group:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
">>> builder.new_row_group(0)\n",
">>> builder.add_next_value(0, 1, 0.5)\n",
">>> builder.add_next_value(0, 2, 0.5)\n",
">>> builder.add_next_value(1, 1, 0.2)\n",
">>> builder.add_next_value(1, 2, 0.8)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this example, we have two nondeterministic choices in state 0.\n",
"With choice 0 we have probability 0.5 to got to state 1 and probability 0.5 to got to state 2.\n",
"With choice 1 we got to state 1 with probability 0.2 and go to state 2 with probability 0.8.\n",
"\n",
"For the remaining states, we need to specify the starting rows of each row group:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
">>> builder.new_row_group(2)\n",
">>> builder.add_next_value(2, 3, 0.5)\n",
">>> builder.add_next_value(2, 4, 0.5)\n",
">>> builder.new_row_group(3)\n",
">>> builder.add_next_value(3, 5, 0.5)\n",
">>> builder.add_next_value(3, 6, 0.5)\n",
">>> builder.new_row_group(4)\n",
">>> builder.add_next_value(4, 7, 0.5)\n",
">>> builder.add_next_value(4, 1, 0.5)\n",
">>> builder.new_row_group(5)\n",
">>> builder.add_next_value(5, 8, 0.5)\n",
">>> builder.add_next_value(5, 9, 0.5)\n",
">>> builder.new_row_group(6)\n",
">>> builder.add_next_value(6, 10, 0.5)\n",
">>> builder.add_next_value(6, 11, 0.5)\n",
">>> builder.new_row_group(7)\n",
">>> builder.add_next_value(7, 2, 0.5)\n",
">>> builder.add_next_value(7, 12, 0.5)\n",
"\n",
">>> for s in range(8, 14):\n",
"... builder.new_row_group(s)\n",
"... builder.add_next_value(s, s - 1, 1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, we build the transition matrix:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
">>> transition_matrix = builder.build()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Labeling\n",
"\n",
"We have seen the construction of a state labeling in previous examples. Therefore we omit the description here\n",
"Instead, we focus on the choices.\n",
"Since in state 0 a nondeterministic choice over two actions is available, the number of choices is 14.\n",
"To distinguish those we can define a choice labeling:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"nbsphinx": "hidden"
},
"outputs": [],
"source": [
">>> state_labeling = stormpy.storage.StateLabeling(13)\n",
">>> labels = {'init', 'one', 'two', 'three', 'four', 'five', 'six', 'done', 'deadlock'}\n",
">>> for label in labels:\n",
"... state_labeling.add_label(label)\n",
"\n",
">>> state_labeling.add_label_to_state('init', 0)\n",
">>> state_labeling.add_label_to_state('one', 7)\n",
">>> state_labeling.add_label_to_state('two', 8)\n",
">>> state_labeling.add_label_to_state('three', 9)\n",
">>> state_labeling.add_label_to_state('four', 10)\n",
">>> state_labeling.add_label_to_state('five', 11)\n",
">>> state_labeling.add_label_to_state('six', 12)\n",
">>> state_labeling.set_states('done', stormpy.BitVector(13, [7, 8, 9, 10, 11, 12]))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
">>> choice_labeling = stormpy.storage.ChoiceLabeling(14)\n",
">>> choice_labels = {'a', 'b'}\n",
"\n",
">>> for label in choice_labels:\n",
"... choice_labeling.add_label(label)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We assign the label ‘a’ to the first action of state 0 and ‘b’ to the second.\n",
"Recall that those actions where defined in row one and two of the transition matrix respectively:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
">>> choice_labeling.add_label_to_choice('a', 0)\n",
">>> choice_labeling.add_label_to_choice('b', 1)\n",
">>> print(choice_labeling) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Reward models\n",
"\n",
"In this reward model the length of the action rewards coincides with the number of choices:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
">>> reward_models = {}\n",
">>> action_reward = [0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]\n",
">>> reward_models['coin_flips'] = stormpy.SparseRewardModel(optional_state_action_reward_vector=action_reward)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Building the Model\n",
"\n",
"We collect the components:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
">>> components = stormpy.SparseModelComponents(transition_matrix=transition_matrix, state_labeling=state_labeling, reward_models=reward_models, rate_transitions=False)\n",
">>> components.choice_labeling = choice_labeling"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We build the model:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
">>> mdp = stormpy.storage.SparseMdp(components)\n",
">>> print(mdp) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Partially observable Markov decision process (POMDPs)\n",
"\n",
"To build a partially observable Markov decision process (POMDP),\n",
"components.observations can be set to a list of numbers that defines the status of the observables in each state."
]
}
],
"metadata": {
"celltoolbar": "Edit Metadata",
"date": 1598178167.234528,
"filename": "building_mdps.rst",
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.2"
},
"title": "Markov decision processes (MDPs)"
},
"nbformat": 4,
"nbformat_minor": 4
}