You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
309 lines
8.4 KiB
309 lines
8.4 KiB
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Markov decision processes (MDPs)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Background\n",
|
|
"\n",
|
|
"In [Discrete-time Markov chains (DTMCs)](building_dtmcs.ipynb) we modelled Knuth-Yao’s model of a fair die by the means of a DTMC.\n",
|
|
"In the following we extend this model with nondeterministic choice by building a Markov decision process.\n",
|
|
"\n",
|
|
"[01-building-mdps.py](https://github.com/moves-rwth/stormpy/blob/master/examples/building_mdps/01-building-mdps.py)\n",
|
|
"\n",
|
|
"First, we import Stormpy:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"hide-output": false
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
">>> import stormpy"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Transition Matrix\n",
|
|
"\n",
|
|
"Since we want to build a nondeterminstic model, we create a transition matrix with a custom row group for each state:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"hide-output": false
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
">>> builder = stormpy.SparseMatrixBuilder(rows=0, columns=0, entries=0, force_dimensions=False, has_custom_row_grouping=True, row_groups=0)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"We need more than one row for the transitions starting in state 0 because a nondeterministic choice over the actions is available.\n",
|
|
"Therefore, we start a new group that will contain the rows representing actions of state 0.\n",
|
|
"Note that the row group needs to be added before any entries are added to the group:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"hide-output": false
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
">>> builder.new_row_group(0)\n",
|
|
">>> builder.add_next_value(0, 1, 0.5)\n",
|
|
">>> builder.add_next_value(0, 2, 0.5)\n",
|
|
">>> builder.add_next_value(1, 1, 0.2)\n",
|
|
">>> builder.add_next_value(1, 2, 0.8)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"In this example, we have two nondeterministic choices in state 0.\n",
|
|
"With choice 0 we have probability 0.5 to got to state 1 and probability 0.5 to got to state 2.\n",
|
|
"With choice 1 we got to state 1 with probability 0.2 and go to state 2 with probability 0.8.\n",
|
|
"\n",
|
|
"For the remaining states, we need to specify the starting rows of each row group:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"hide-output": false
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
">>> builder.new_row_group(2)\n",
|
|
">>> builder.add_next_value(2, 3, 0.5)\n",
|
|
">>> builder.add_next_value(2, 4, 0.5)\n",
|
|
">>> builder.new_row_group(3)\n",
|
|
">>> builder.add_next_value(3, 5, 0.5)\n",
|
|
">>> builder.add_next_value(3, 6, 0.5)\n",
|
|
">>> builder.new_row_group(4)\n",
|
|
">>> builder.add_next_value(4, 7, 0.5)\n",
|
|
">>> builder.add_next_value(4, 1, 0.5)\n",
|
|
">>> builder.new_row_group(5)\n",
|
|
">>> builder.add_next_value(5, 8, 0.5)\n",
|
|
">>> builder.add_next_value(5, 9, 0.5)\n",
|
|
">>> builder.new_row_group(6)\n",
|
|
">>> builder.add_next_value(6, 10, 0.5)\n",
|
|
">>> builder.add_next_value(6, 11, 0.5)\n",
|
|
">>> builder.new_row_group(7)\n",
|
|
">>> builder.add_next_value(7, 2, 0.5)\n",
|
|
">>> builder.add_next_value(7, 12, 0.5)\n",
|
|
"\n",
|
|
">>> for s in range(8, 14):\n",
|
|
"... builder.new_row_group(s)\n",
|
|
"... builder.add_next_value(s, s - 1, 1)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Finally, we build the transition matrix:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"hide-output": false
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
">>> transition_matrix = builder.build()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Labeling\n",
|
|
"\n",
|
|
"We have seen the construction of a state labeling in previous examples. Therefore we omit the description here\n",
|
|
"Instead, we focus on the choices.\n",
|
|
"Since in state 0 a nondeterministic choice over two actions is available, the number of choices is 14.\n",
|
|
"To distinguish those we can define a choice labeling:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"nbsphinx": "hidden"
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
">>> state_labeling = stormpy.storage.StateLabeling(13)\n",
|
|
">>> labels = {'init', 'one', 'two', 'three', 'four', 'five', 'six', 'done', 'deadlock'}\n",
|
|
">>> for label in labels:\n",
|
|
"... state_labeling.add_label(label)\n",
|
|
"\n",
|
|
">>> state_labeling.add_label_to_state('init', 0)\n",
|
|
">>> state_labeling.add_label_to_state('one', 7)\n",
|
|
">>> state_labeling.add_label_to_state('two', 8)\n",
|
|
">>> state_labeling.add_label_to_state('three', 9)\n",
|
|
">>> state_labeling.add_label_to_state('four', 10)\n",
|
|
">>> state_labeling.add_label_to_state('five', 11)\n",
|
|
">>> state_labeling.add_label_to_state('six', 12)\n",
|
|
">>> state_labeling.set_states('done', stormpy.BitVector(13, [7, 8, 9, 10, 11, 12]))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"hide-output": false
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
">>> choice_labeling = stormpy.storage.ChoiceLabeling(14)\n",
|
|
">>> choice_labels = {'a', 'b'}\n",
|
|
"\n",
|
|
">>> for label in choice_labels:\n",
|
|
"... choice_labeling.add_label(label)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"We assign the label ‘a’ to the first action of state 0 and ‘b’ to the second.\n",
|
|
"Recall that those actions where defined in row one and two of the transition matrix respectively:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"hide-output": false
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
">>> choice_labeling.add_label_to_choice('a', 0)\n",
|
|
">>> choice_labeling.add_label_to_choice('b', 1)\n",
|
|
">>> print(choice_labeling) "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Reward models\n",
|
|
"\n",
|
|
"In this reward model the length of the action rewards coincides with the number of choices:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"hide-output": false
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
">>> reward_models = {}\n",
|
|
">>> action_reward = [0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]\n",
|
|
">>> reward_models['coin_flips'] = stormpy.SparseRewardModel(optional_state_action_reward_vector=action_reward)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Building the Model\n",
|
|
"\n",
|
|
"We collect the components:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"hide-output": false
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
">>> components = stormpy.SparseModelComponents(transition_matrix=transition_matrix, state_labeling=state_labeling, reward_models=reward_models, rate_transitions=False)\n",
|
|
">>> components.choice_labeling = choice_labeling"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"We build the model:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"hide-output": false
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
">>> mdp = stormpy.storage.SparseMdp(components)\n",
|
|
">>> print(mdp) "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Partially observable Markov decision process (POMDPs)\n",
|
|
"\n",
|
|
"To build a partially observable Markov decision process (POMDP),\n",
|
|
"components.observations can be set to a list of numbers that defines the status of the observables in each state."
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"celltoolbar": "Edit Metadata",
|
|
"date": 1598178167.234528,
|
|
"filename": "building_mdps.rst",
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.8.2"
|
|
},
|
|
"title": "Markov decision processes (MDPs)"
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 4
|
|
}
|