Package: markovDP 0.99.0

markovDP: Infrastructure for Discrete-Time Markov Decision Processes (MDP)

Provides the infrastructure to work with Markov Decision Processes (MDPs) in R. The focus is on convenience in formulating MDPs, the support of sparse representations (using sparse matrices, lists and data.frames) and visualization of results. Some key components are implemented in C++ to speed up computation. Several popular solvers are implemented.

Authors:Michael Hahsler [aut, cph, cre]

markovDP_0.99.0.tar.gz
markovDP_0.99.0.zip(r-4.7)markovDP_0.99.0.zip(r-4.6)markovDP_0.99.0.zip(r-4.5)
markovDP_0.99.0.tgz(r-4.6-x86_64)markovDP_0.99.0.tgz(r-4.6-arm64)markovDP_0.99.0.tgz(r-4.5-x86_64)markovDP_0.99.0.tgz(r-4.5-arm64)
markovDP_0.99.0.tar.gz(r-4.7-arm64)markovDP_0.99.0.tar.gz(r-4.7-x86_64)markovDP_0.99.0.tar.gz(r-4.6-arm64)markovDP_0.99.0.tar.gz(r-4.6-x86_64)
markovDP_0.99.0.tgz(r-4.6-emscripten)
manual.pdf |manual.html
DESCRIPTION |NEWS
card.svg |card.png
markovDP/json (API)

# Install 'markovDP' in R:
install.packages('markovDP', repos = c('https://mhahsler.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/mhahsler/markovdp/issues

Uses libs:
  • c++– GNU Standard C++ Library v3
Datasets:

On CRAN:

Conda:

control-theorymarkov-decision-processoptimizationcpp

4.49 score 7 stars 11 scripts 106 exports 25 dependencies

Last updated from:e8ab3f595e. Checks:13 OK. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-arm64OK208
linux-devel-x86_64OK216
source / vignettesOK323
linux-release-arm64OK234
linux-release-x86_64OK253
macos-release-arm64OK156
macos-release-x86_64OK512
macos-oldrel-arm64OK210
macos-oldrel-x86_64OK389
windows-develOK220
windows-releaseOK211
windows-oldrelOK212
wasm-releaseOK1276

Exports:Aabsorbing_statesactactionaction_discrepancyadd_policyapprox_greedy_actionapprox_greedy_policyapprox_Q_valueapprox_V_plotapprox_valueavailable_actionsbellman_operatorbellman_updatecolors_continuouscolors_discreteconvergence_horizoncreate_basis_coefscurve_multiple_directedfeatures2statefind_reachable_statesget_state_featuresgreedy_actiongreedy_policygw_animategw_initgw_matrixgw_maze_MDPgw_pathgw_plotgw_plot_transition_graphgw_random_mazegw_rc2sgw_read_mazegw_s2rcgw_transition_modelgw_transition_model_end_stategw_transition_model_namedgw_transition_model_sparseinduced_reward_matrixinduced_transition_matrixis_converged_MDPis_solved_MDPmanual_policyMDPMDPSamplenormalize_actionnormalize_action_idnormalize_action_labelnormalize_MDPnormalize_statenormalize_state_featuresnormalize_state_idnormalize_state_labelP_pi_approx_linearplot_transition_graphplot_value_functionpolicypolicy_evaluationpolicy_evaluation_bellmanpolicy_evaluation_LPpolicy_evaluation_MCq_approx_linearQ_randomQ_valuesQ_zeroR_random_policyreachable_statesregretremove_unreachable_statesrewardreward_matrixround_stochasticsSsample_MDPschedule_expschedule_exp2schedule_harmonicschedule_linearschedule_logsolve_MDPsolve_MDP_APPROXsolve_MDP_DPsolve_MDP_LPsolve_MDP_MCsolve_MDP_PGsolve_MDP_SAMPsolve_MDP_TDstart_vectorstate2featurestransformation_fourier_basistransformation_linear_basistransformation_polynomial_basistransformation_RBF_basistransition_graphtransition_matrixunreachable_statesv_approx_linearV_randomV_zerovalue_errorvalue_functionvisit_probability

Dependencies:clicodetoolscpp11crayonfastmapfloatforeachgluehmsigraphiteratorslatticelifecyclelpSolvemagrittrMatrixMatrixExtrapkgconfigprettyunitsprogressR6RcppRhpcBLASctlrlangvctrs

Introduction to Discrete-Time Markov Decision Processes
Introduction | Markov Decision Processes | Definition | Notation Used in the Package | Package Functionality | Defining an MDP Problem | Solving a MDP | Toy Example: Steward Russell's 4x3 Maze Gridworld MDP | Specifying the Stochastic Maze | Solving the Maze | Additional Functions | Access to Model Components | Value Function | Policy | Evaluation | Sampling | Acknowledgments | References

Last update: 2025-07-19
Started: 2024-05-31

Solving MDPs with Linear Approximation
Introduction | Linear Approximation | State-action Feature Vector Construction | Helper Functions | Episodic Semi-gradient Sarsa | Example 1: A maze without walls | Example 2: Stuart Russell's 3x4 Maze using Linear Approximation | Linear Basis | Order-1 Polynomial Basis | Radial Basis Function | Order-1 Fourier Basis | Example 3: Wall Maze | References

Last update: 2025-06-18
Started: 2025-02-20

Solving Tic-Tac-Toe as a MDP
Introduction | Defining Tic-Tac-Toe as an MDP | Actions | State Space | Transition Model | Reward Function | Constructing the MDP | Define Test Boards | Solve | LP | Value Iteration | Policy Iteration | Q-Learning | References

Last update: 2025-05-15
Started: 2024-09-01

Gridworlds as MDPs
Introduction | Defining a Gridworld | Working with Gridworld MDPs | Solving a Gridworld | Experimenting with Solvers | References

Last update: 2025-02-20
Started: 2025-02-20

Readme and manuals

Help Manual

Help pageTopics
Absorbing Statesabsorbing_states absorbing_states.MDP absorbing_states.MDPSample
Perform an Actionact act.MDPModel act.MDPSample
Choose an Action Given a Policyaction
Conversions for Action and State IDs and Labelsaction_state_helpers features2state get_state_features normalize_action normalize_action_id normalize_action_label normalize_state normalize_state_features normalize_state_id normalize_state_label s state2features
Available Actions in a Stateavailable_actions
Bellman Update and Bellman operatorbellman_operator bellman_update
Cliff Walking Gridworld MDPCliff_walking cliff_walking
Default Colors for Visualizationcolors colors_continuous colors_discrete
Estimate the Convergence Horizon for an Infinite-Horizon MDPconvergence_horizon
The Dyna MazeDynaMaze dynamaze
Find Reachable State Space from a Transition Model Functionfind_reachable_states
Greedy Actions and Policiesgreedy_action greedy_policy
Helper Functions for Gridworld MDPsgridworld gw gw_animate gw_init gw_matrix gw_maze_MDP gw_path gw_plot gw_plot_transition_graph gw_random_maze gw_rc2s gw_read_maze gw_s2rc gw_transition_model gw_transition_model_end_state gw_transition_model_named gw_transition_model_sparse
Linear Function Approximationapprox_value linear_function_approximation pi_approx_linear q_approx_linear v_approx_linear
Steward Russell's 4x3 Maze Gridworld MDPMaze maze
Define an MDP ProblemA is_converged_MDP is_solved_MDP MDP MDPModel P_ R_ S
Define an MDP With Only Sample AccessMDPSample
Extract, Create Add a Policy to a Modeladd_policy induced_reward_matrix induced_transition_matrix manual_policy policy random_policy
Policy Evaluationpolicy_evaluation policy_evaluation_bellman policy_evaluation_LP policy_evaluation_MC
Q-ValuesQ_random Q_values Q_zero
Find Reachable Statesreachable_states reachable_states.function reachable_states.MDPModel reachable_states.MDPSample
Regret of a Policy and Related Measuresaction_discrepancy regret value_error
Calculate the Expected Reward of a Policyreward reward.MDP
Round a stochastic vector or a row-stochastic matrixround_stochastic
Sample Trajectories from an MDPsample_MDP sample_MDP.MDP
Sample Trajectories from an MDPSamplesample_MDP.MDPSample
Schedules to Reduce Alpha, Epsilon and Other Parametersschedule schedule_exp schedule_exp2 schedule_harmonic schedule_linear schedule_log
Solve an MDP Problemsolve_MDP solve_MDP.MDP solve_MDP.MDPSample
Solve MDPs with Temporal Differencing with Function Approximationapprox_greedy_action approx_greedy_policy approx_Q_value approx_V_plot solve_MDP_APPROX
Solve MDPs using Dynamic Programmingsolve_MDP_DP
Solve MDPs using Linear Programmingsolve_MDP_LP
Solve MDPs using Monte Carlo Controlsolve_MDP_MC
Solve MDPs with Policy Gradient Methodssolve_MDP_PG
Solve MDPs using Random-Samplingsolve_MDP_SAMP
Solve MDPs using Tabular Temporal Differencingsolve_MDP_TD
Sample a Start Statestart start.MDPModel start.MDPSample
Transformation Functions for Linear Function Approximationcreate_basis_coefs transformation transformation_fourier_basis transformation_linear_basis transformation_polynomial_basis transformation_RBF_basis
Transition Graphcurve_multiple_directed plot_transition_graph transition_graph
Access to Parts of the Model Descriptionaccessors normalize_MDP reward_matrix start_vector transition_matrix
Unreachable Statesremove_unreachable_states unreachable_states
Value Functionplot_value_function value_function V_random V_zero
State Visit Probabilityvisit_probability
Windy Gridworld MDP Windy Gridworld MDPWindy_gridworld windy_gridworld