Welcome to **Decision Analysis and Forecasting for Agricultural Development**. We are excited to deliver you this course content and to train a new generation of experts in applying these tools. We see enormous scope for application of these methods in agriculture and the need for more practitioners. The course has a dedicated interactive website but if you prefer you can also look at the simple html version.

Decisions in agriculture are often risky, meaning that decision makers have uncertainty about how their decision will influence the intended outcomes. Farming systems are dynamic and the impact of any intervention, policy or management decision is likely to be influenced by many factors ranging from soil and crop dynamics to social dynamics such as farmer and community perceptions. In order to provide scientific support for decision making it is important that our models and forecasts attempt to include these interactions. Decision Analysis provides a set of approaches that are aimed at capturing what is known and applying this knowledge to generate forecasts of decision outcomes. The theoretical body of approaches have been formally applied in business and economics for more than 50 years Howard and Abbas (2015) and have been gaining ground in agricultural development research more recently Luedeling and Shepherd (2016). Our mission is to normalize these holistic approaches and move agricultural science from the lab bench and field trials to real world application and decision support for farmers and agricultural decision makers such as those in government ministries and aid organizations.

**A student is not a vessel to be filled, but a lamp to be lighted…** - Plutarch

**What?**This is an experiential course. You (participants) will be leading discussions and projects.**So What?**The aim is for you to become a critical thinker and lead the process in generating useful forecasts for decision makers in agricultural development.**Now What?**You will build a working model of an agricultural development decision.

Here is a course overview:

In this course, we will provide an overview of methods we can use to study agricultural development decisions and generate forecasts of decision outcomes. If you do not have a clear idea what that is you’ve come to the right place. This course was developed for MSc. students in Crop Sciences at the University of Bonn, but the materials are also intended for a general audience interested in applying these tools. The main parts of the course include:

- Introduction to Decision Analysis
- Calibration training
- Participatory modeling building
- Decision modeling in R
- Group project on decision analysis

See the official pages in the coursebook here **Decision Analysis and Forecasting for Agricultural Development**.

After some introduction of the subject matter much of this course will consist of practical application of the `decisionSupport`

package in the R programming language (R Core Team 2021).

In this course, we aim to provide you with the skills and experiences in decision modeling. By the end of this course you will be able to:

develop decision models and comprehensively evaluate the findings using the

`R`

programming language with functions from the`decisionSupport`

package (Luedeling et al. 2021).recognize your own biases and provide accurate range estimates for uncertain variables.

analyze a decision context. You will be able to draw conclusions from a decision model and recommend steps forward.

develop decision models, comprehensively evaluate your findings and compose a report about the model you developed.

In this module, there will be no exam, but you’ll still have to put in a bit of effort to pass and get a good grade here. Over the course of the semester, you’ll be developing an Rmarkdown document, where you’ll record your learning and your coding work (here is an example from Hoa Do with an overview that includes the main modeling procedures with the decisionSupport package in R) (Luedeling et al. 2021). This document will contain short thematic chapters on the lecture contents, as well as the code you’ll produce and the outputs you’ll generate. The final chapters will contain some discussion of the results you’ll have obtained. This document will be evaluated, and it will be the major determinant of your final grade (participation in class will also count for some of the grade).

There will be about 60 hours of formal in-class time (15 ~ 4 hour lectures and seminars) and 120 hours of work outside the class.

Class participation will be about 50% of the work in this course. We will expect you to show up in class and be an active participant in discussions. We may also have short weekly tests for seeing how up to speed you are and we will track your project activity (follow your git repositories etc.).

Group work will make up the rest of the required work for this course. You will put together a report, build a repository with code and a working RMarkdown file. You will also do weekly reading/listening assignments and lead short related discussions.

In this course, we’ll gradually work our way into the `decisionSupport`

package (Luedeling et al. 2021). At the end of the semester, you should be able to apply state-of-the-art Decision Analysis. Yet even if it’s not your ambition to become a decision analyst, you’ll have picked up a bunch of useful skills along the way. We’ll try to teach you how to use some staple programming tools and techniques that can be used for all kinds of programming challenges. These will include the use of git and github and Rmarkdown, as well the the ability to create, manipulate and use R functions and packages.

What is expected of you is to be engaged in this class, and to diligently complete the assignments you receive. Learning things in R requires practice, through which many things that seem cumbersome at first eventually become almost automatic. We are hopeful that the things you’ll get exposed to in this class will be assets in your scientific (or other) career. So please take the materials seriously!

This course not only aims to teach you things about Decision Analysis and related topics - it also provides hands-on exercises to illustrate the functions of the `decisionSupport`

package Luedeling et al. (2021). For all of these practical components, we need tools. `decisionSupport`

is an R package, so we’ll need R, which is most comfortably operated through the RStudio interface.

We could simply start running RStudio on our local computer, save our files somewhere on our hard drive and generally operate the way we usually work with our computer. But this is not how real programmers work, and since this is what we’re trying to be, we should familiarize ourselves with some code development tools that such people use. We’ll therefore introduce you to git and github, which are very useful for keeping code organized and secure and for sharing with collaborators and wider audiences.

Finally, we want to show you how to properly document what you do in R, and how to compile professional-looking reports of your work. This is what Rmarkdown helps us do. Some of this may be a bit confusing in the beginning, but you’ll probably learn to appreciate the value of these tools, as we keep using them in this module.

The course will be largely based on group work. Please take some time to choose a group to work with and begin to think about a decision. Over the next few weeks you will be responsible for the following milestones:

Identify a decision

Collaborate with decision makers on decision model development (qualitative)

Generate model code from qualitative model and parameterize (quantitative step)

Generate a git repo with code and documentation

Write a ~ 5,000 word paper in RMarkdown, see this report on plotting high-dimensional data for a useful example of how this can be done.

In addition, each week small teams (groups of two or three) will be responsible for presenting and leading a discussion on a short paper or chapter. We expect everyone in the course to read the work and come prepared for the meeting. Please contact us if you cannot find the readings in the library or elsewhere. We will help you get a copy.

Bonus: to get more background on the teaching methods that inspire this course watch Sir Ken Robinson’s talk on Ted.com. If you like it you might enjoy reading his book ‘Out of Our Minds’ (Robinson 2017).

Lecture | Date | Materials | Reading |
---|---|---|---|

1 | Wednesday, April 10, 2024 | Introduction | Watch: Sir Ken Robinson / Hans Rosling (optional) |

2 | Friday, April 12, 2024 | Decision Analysis Overview | Hubbard (2014) (Chapter 1. The Challenge of Intangibles) |

3 | Wednesday, April 17, 2024 | Defining a Decision | Howard and Abbas (2015) (Chapter 1. Introduction to Quality Decision Making) |

4 | Friday, April 19, 2024 | Decision Analysis Case Studies | Shepherd et al. (2015) |

5 | Wednesday, April 24, 2024 | Participatory Methods | Luedeling and Shepherd (2016) |

6 | Friday, April 26, 2024 | Building Decision Models | Do, Luedeling, and Whitney (2020) |

7 | Friday, May 3, 2024 | Using Models to Create Forecasts | Tetlock and Gardner (2015) (Chapter 1. An Optimistic Skeptic) |

8 | Wednesday, May 8, 2024 | Bayesian Thinking | Bertsch McGrayne (2011) (Chapter 1. Causes in the Air) |

9 | Friday, May 10, 2024 | Calibrating experts | Kahneman and Egan (2011) (Chapter 1) |

10 | Friday, May 17, 2024 | Profile of a Decision Analyst | Savage and Markowitz (2009) (Chapter 1) |

Week | Date | Materials | Assignment |
---|---|---|---|

1 | Wednesday, May 28, 2024 | Using R and RStudio | Install R and Rstudio |

2 | Friday, May 31, 2024 | Using R and RStudio continued | Share R scripts (part 1) |

3 | Wednesday, June 5, 2024 | Group Work Overview | Share R scripts (part 2) |

4 | Friday, June 7, 2024 | Using RMarkdown | Share Rmarkdown file |

5 | Wednesday, June 12, 2024 | Using git and Github | Start a Github repo |

6 | Friday, June 14, 2024 | Model Programming | Share R scripts (part 3) |

7 | Wednesday, June 19, 2024 | Calibration training | Participate in calibration training |

8 | Friday, June 21, 2024 | Model functions | Share updated model |

9 | Wednesday, June 26, 2024 | Value of information | Share R scripts (part 3) |

10 | Friday, June 28, 2024 | Communicating Decision Support | |

11 | Wednesday, July 03, 2024 | Model forecasts | Share initial forecasts |

12 | Friday, July 5, 2024 | Citation Management | Join Zotero Group |

13 | Wednesday, July 10, 2024 | Guest Lecture | |

14 | Friday, July 12, 2024 | Consultation | |

15 | Wednesday, July 17, 2024 | Groups present / discuss final model | Present |

16 | Friday, July 19, 2024 | Presentations continued | Present |

*“Everything is vague to a degree you do not realize till you have tried to make it precise.”*

― Whitehead and Russell (2011)

Welcome to lecture 2 of **Decision Analysis and Forecasting for Agricultural Development**. In this lecture we will follow a six-part micro-series of videos and use the `#lecture-02-overview`

Slack channel and Zoom meeting to follow up. This will allow us all to have a chance to see your progress, provide feedback. It will be encouraging to others to see activity from colleagues. Feel free to bring up any questions or concerns in the Slack or to Dr. Cory Whitney or the course tutor.

Please take a few minutes to watch these six short videos and do the associated exercises. The slides will be shared in the respective Slack channels.

The introduction outlines the general theory and application of Decision Analysis tools. **At the end of the video please take a moment to think about decisions that are made in agricultural development**, ideally decisions that are relevant to your research interests. Please share these with us in the Slack channel.

**Remember to take a moment to gather your thoughts so far and share with us in Slack. **

Part 1 gives an overview of various approaches in agricultural research and of the Decision Analysis paradigm. At the end of the video please take a moment to think about a situation where Decision Analysis might be applicable. Try and identify a decision maker with a choice between different options and with uncertainty about the outcome.

**Remember to share a specific risky decision that might be supported with these tools. Share your ideas with us in Slack.**

Part 2 offers some background on the theory and development of Decision Analysis and how we apply it in agricultural decisions.

At the end of the video please take some time to think about a practical application of the kinds of approaches to dealing with uncertainty that are used in Decision Analysis. Describe a few variables that you consider difficult to measure but for which the effects on a decision might be possible to describe as uncertainty distributions. Share with us in Slack.

Part 3 gives an overview of model building processes and approaches.

**At the end of this video take some time to draw a decision impact pathway.** Draw a conceptual diagram of an agricultural development decision. Include the major factors that would influence the decision, no matter how difficult they might be to measure. Feel free to use any program you like to make this drawing. Loopy is a very nice interface. Google Slides, Biorender are also good options. For the more advance R users you might like the igraph library. For those who prefer analog, a photo or scan of a hand drawn diagram is also welcome. Be sure to save your work and share it with us in Slack, a screen shot is fine, a working link is even better.

Part 4 gives an overview of various types of cognitive biases and outlines how these can interfere with the definition of and parameterization of models. The session ends with a number of approaches that can be applied to help in overcoming these biases and outlines the calibration training process whereby experts are trained to give accurate estimates of their own uncertainty.

**At the end of the video please take some time to consider the variables in your conceptual diagram that might benefit from calibrated estimates.** Describe variables in your model that might be difficult to measure but the effects of which might be possible to describe with calibrated-expert derived uncertainty distributions.

Parts 5 and 6 offer some brief Decision Analysis case studies and conclusions. At the end of the video please take some time to revisit your model and consider these approaches in practice. Feel free to discuss in Slack about what application these approaches might have with regard to your research.

This week you will all read Hubbard (2014) (Chapter 1, pp 1-14). The first group will lead a discussion on the reading.

- Hubbard, Douglas W. How To Measure Anything: Finding the Value of Intangibles in Business. 2nd ed. Vol. Second Edition. Hoboken, New Jersey: John Wiley & Sons, 2014.
*Part 1:The measurement solution exists*

Bonus: to get an economist’s perspective on the Decision Analysis approaches watch ‘Lessons from applying decision science in the public health sector’ by Professor Karl Claxton in the Constructing a Digital Environment Webinar Series. The talk gets really interesting at around eight minutes in.

The following videos cover some of the challenges that researchers and agricultural development experts face when attempting to define decisions. Please watch the videos and answer the questions that follow.

Think about the following questions and jot down your thoughts.

**Does decision analysis make sense without involving a decision-maker?****What can this achieve, and what shouldn’t we expect from it?**

Think of challenges you may encounter when trying to do *credible*, *salient* and *legitimate* research (all of which decision analysts should aim to achieve).

This video outlines the background for quality decision making, closely following Howard and Abbas (2015). At the end of the video please take some time to sketch out some thoughts about a decision that you will be working on for the experiential part of this course. You can find the slides for the first three parts of the lecture here.

**Define the decision-maker**: The person, organization or group that will be faced with choices.**Define the decision choice and alternatives**: from which to choose**Define the preferences**: clarify how the decision-makers consider the different possible consequences and outcomes of the decision. These can take many forms, i.e. they can be personal, societal or cultural values.**Define the sources of Information**: What sources of information will you access to gather a causal understanding of the decision impact pathway. Be explicit about where you will reach out, i.e. specific farmers, literature etc.

This week you will all read Howard and Abbas (2015) (Chapter 1. Introduction to Quality Decision Making). One group will lead a discussion on the reading.

- Howard, Ronald A., and Ali E. Abbas. Foundations of Decision Analysis. NY, NY: Prentice Hall, 2015.

Bonus reading by Shepherd et al. (2015).

- Shepherd, Keith, Douglas Hubbard, Norman Fenton, Karl Claxton, Eike Luedeling, and Jan de Leeuw. “Development Goals Should Enable Decision-Making.” Nature 523, no. 7559 (2015): 152–54.

Welcome to lecture 4 of **Decision Analysis and Forecasting for Agricultural Development**. In this lecture we will follow three case studies and use the `#lecture-04-case-studies`

Slack channel and Zoom meeting to follow up. This will allow us all to have a chance to see your progress, provide feedback. As always, your engagement and sharing is appreciated. It will be encouraging to others to see activity from colleagues. Feel free to bring up any questions or concerns in the Slack or to Dr. Cory Whitney or the course tutor.

Please take a few minutes to watch these short videos and do the associated exercises before our meeting.

The following videos introduce some decision analysis case studies. Please watch the videos and answer the questions that follow.

The city of Wajir in Northern Kenya has lacks a reliable supply of clean drinking water and sanitation. To improve the situation, plans are being considered to construct a water pipeline from Habaswein over 100 km away (Luedeling and Leeuw 2014; Luedeling et al. 2015). Watch the following video to learn more.

Now try to answer the following questions:

This video shows the study Rojas et al. (2021) conducted in Chile for assessing the profitability of implementing plastic covers in sweet cherry orchards to protect the fruits from hazardous weather events.

Based on the question above, think about and write down two to three positive and negative implications of using the method we used to define the decision.

The data and scripts for this study are also available online (Fernandez et al. 2021).

- Luedeling, Eike, and Keith Shepherd. “Decision-Focused Agricultural Research.” Solutions 7, no. 5 (2016): 46–54.

The following video outlines some of the tools that can be used in participatory modeling. Watch the video and answer the questions that follow.

The following video covers some definitions, methods and tools and a case study related to stakeholder management in decision analysis. Please watch the videos and answer the questions that follow. These will be helpful in determining which tools/techniques you might use to identify stakeholders in your decision analysis process.

Luu et al

Liberating Structures by Keith McCandless (and others)

Reed, M. S. et al. (2009) ‘Who’s in and why? A typology of stakeholder analysis methods for natural resource management’, Journal of Environmental Management, 90(5), pp. 1933–1949. doi: 10.1016/j.jenvman.2009.01.001.

Welcome to lecture 6 of **Decision Analysis and Forecasting for Agricultural Development**. We will walk through brief examples and offer the scripts. Feel free to bring up any questions or concerns in the Slack, to Dr. Cory Whitney or to the course tutor.

Decision-makers often wish to have a quantitative basis for their decisions. However,‘hard data’ is often missing or unattainable for many important variables, which can paralyze the decision-making processes or lead decision-makers to conclude that large research efforts are needed before a decision can be made. That is, many variables decision makers must consider cannot be precisely quantified, at least not without unreasonable effort. The major objective of (prescriptive) decision analysis is to support decision-making processes faced with this problem. Following the principles of Decision Analysis can allow us to make forecasts of decision outcomes without precise numbers, as long as probability distributions describing the possible values for all variables can be estimated.

The `decisionSupport`

package implements this as a Monte Carlo simulation, which generates a large number of plausible system outcomes, based on random numbers for each input variable that are drawn from user-specified probability distributions. This approach is useful for determining whether a clearly preferable course of action can be delineated based on the present state of knowledge without the need for further information. If the distribution of predicted system outcomes does not imply a clearly preferable decision option, variables identified as carrying decision-relevant uncertainty can then be targeted by decision-supporting research. This approach is explained in more detail below and in the model programming seminar.

In this portion of the course we hope to introduce you to the methods and inspire you to follow a few important guidelines in the process of model building. One of the key aspects of model building has to do with making a solid business case for the model before programming.

Another important guideline is to start simple then move on to other steps. This ensures that you always have a working model at each of the steps in the process, i.e. starting with a skateboard rather than a car part as in this example from MetaLab.

Whitney et al. (2018) ‘Probabilistic decision tools for determining impacts of agricultural development policy on household nutrition’.

Ruett, Whitney, and Luedeling (2020) ‘Model-based evaluation of management options in ornamental plant nurseries’

Welcome to lecture 7 of **Decision Analysis and Forecasting for Agricultural Development**. We will walk through brief examples and offer the scripts. Feel free to bring up any questions or concerns in the Slack or to Dr. Cory Whitney or the course tutor.

- Tetlock, Philip E., and Dan Gardner. Superforecasting: The Art and Science of Prediction. New York, NY: Crown Publishers, 2015.(Chapter 1. An Optimistic Skeptic)

Welcome to the lecture on Bayesian modeling with me, Dr. Katja Schiffers. Although we mainly implement these methods in our group for decision analysis, we do not want to withhold the Bayesian Modeling approaches from you. Originally, I prepared this introduction for a seminar on the analysis of experimental data, so the focus is more on comparing frequentist and Bayesian statistics than on decision models. In this lecture we will use the `brms`

package and use the `#lecture_08_01_practical`

Slack channel to follow up. This will allow us all to have a chance to see your progress, provide feedback. As always, your engagement and sharing is appreciated. It will be encouraging to others to see activity from colleagues. Feel free to bring up any questions or concerns in Slack or to Dr. Cory Whitney or the course tutor.

The following text heavily relies on a very well-written article “Short introduction to Bayesian statistics with R for ornithologists” (2016) by Fränzi Korner-Nievergelt & Ommo Hüppop in ‘Die Vogelwarte’. At the end of the lecture, you will know

- the principle of Bayesian statistics,
- ways to apply it using the R package ‘bmrs’, and
- the differences between frequentist and Bayesian statistics regarding the interpretation of results.

No introduction to Bayesian statistics is complete without Bayes’ theorem. Thomas Bayes was an English clergyman, philosopher, and statistician who, in 1763 published a manuscript entitled “An essay towards solving a problem in the doctrine of chances.” In this work, he describes how, based on existing knowledge and additional observations **X**, the probability can be calculated that a hypothesis **H** is true:

\[P(H|X) = \frac {P(X|H)×P(H)}{P(X)}\]

In words, the probability (P) that the hypothesis is true, given the observed data (P(H|X), **a-posteriori** knowledge), is equal to the probability of observing the data, assuming the hypothesis is true (P(X|H), **likelihood**), times the prior probability that the hypothesis is true before observing the data (P(H), **a-priori** knowledge or **prior**), divided by the probability of observing the data regardless of any assumptions about the hypothesis (P(X), normalization constant).

It is important to note that hypotheses are expressed as parameter values. For example, if our hypothesis is that the mean of a dependent variable is greater under treatment A than under treatment B, we would calculate the probability for the parameter mean(A) - mean(B) and obtain the probability that it is greater than zero.

In Bayesian statistics, this idea is used to combine prior knowledge from experts/literature (P(H)) with the information contained in newly collected data X to generate the updated a-posteriori knowledge. We also see that as a result, we obtain the probability of our hypothesis. This is much more digestible and closer to our “normal” way of thinking than interpreting a p-value: **“The probability of finding data as extreme as this or more extreme, if the null hypothesis is true.”**

**Excursion: Probability distributions**

To represent probabilities, such as prior knowledge, probability distributions are used. In the following figure, you can see such a probability distribution: the higher the density, the more likely the parameter value plotted on the x-axis. The total area under the curve always adds up to 1 (exactly some parameter value must always be true).

So how do we calculate this probability? Unfortunately, it’s usually not as straightforward as it seems. While we can usually calculate the likelihood (this is also done in frequentist statistics to determine the parameter values of our models) and establish our a-priori knowledge through a probability distribution, things get tricky with the P(X) part, the probability of the data, at least once we no longer have very simple models. To do this, we would have to integrate the probability of the data over all possible parameter values, and this is often not possible. Fortunately, this problem can be overcome with a simulation technique developed in the 1980s and 1990s:

MCMC methods can approximate probability distributions (in this case, P(H|X)) that cannot be analytically computed. The most intuitive and quite ingenious explanation I found for this is by Michael Franke on his Github page at https://michael-franke.github.io/intro-data-analysis/Ch-03-03-estimation-algorithms.html. Here are some parts of it:

**How the Apples Get to the Trees**

Every year in spring, Mother Nature sends out her children to distribute apples on the apple trees. For each tree, the number of apples should be proportional to the number of leaves: Giant Ralph with twice as many leaves as Skinny Daniel should also have twice as many apples in the end. So if there are a total of \(n_a\) apples, and \(L(t)\) is the number of leaves on tree \(t\), each tree should receive \(A(t)\) apples.

\[A(t) = \frac {L(t)}{\sum L(t')}n_a\]

The problem is that Mother Nature cannot calculate \(\sum L(t')\) (the normalization constant), so she doesn’t know how many leaves all the trees have together.

The children (Markov chains) can count, however. They can’t visit all the trees and they can’t remember the numbers for very long, but they can do it for two trees at a time. Here’s what they do: they each start at a random tree (parameter value), already put an apple in it, count the leaves \(L(t_1)\) there and then look for a nearby tree from which they can also count the leaves \(L(t_2)\) (the denominator of our distribution). If the number of leaves in the second tree is higher, they will definitely go there and put an apple in it. If it’s lower, they will only go there with a probability of \(L(t_2)/L(t_1)\) and put an apple there. They keep walking back and forth between the trees, and in the end, the apples are roughly evenly distributed. The frequency of visits by the children has therefore approximated the desired distribution (which Mother Nature couldn’t calculate)!

MCMC methods do the same thing: an MCMC chain randomly draws values for the model parameters and computes `Result1 = Likelihood of the data * Prior for them`

. Then it draws values randomly around the first ones and computes `Result2 = Likelihood of the data * Prior for them`

. If `Result2`

is higher than `Result1`

, it jumps there and draws new parameter values from there. If `Result2`

is lower, it only jumps there with a probability of `Result2/Result1`

. Then, values are drawn randomly again, and so on. In the following figure, successful jumps are represented by blue arrows and rejected jumps by green arrows.

If this process is continued long enough, the distribution of the blue dots approaches the posterior probability distribution of the parameters that we want: We have combined a priori knowledge with the information contained in the newly collected data to successfully obtain posterior knowledge!

We need to decide which chicken breed we want to keep in our orchard. We already have 3 Kraienköppe and 6 Niederrheiner. We prefer the Niederrheiner because they are less aggressive, but the Kraienköppe seem to have a slightly higher egg-laying performance. Is it really advisable to choose Kraienköppe because of this tendency?

**1: Definition of the A-priori-probability distribution**

We inquire about the egg production of the two breeds and learn that Niederrheiners lay between 190 and 210 eggs per year, while Kraienkoppes lay between 200 and 220. Accordingly, we formulate our prior probability distributions using the `brms`

package:

`library(brms)`

To get the `brms`

library to work as expected you may need the latest `rstan`

and `StanHeaders`

. Run `install.packages("rstan")`

. `install.packages("StanHeaders")`

. You may also need to set up the connection with these libraries with `options(mc.cores = parallel::detectCores())`

and `rstan_options(auto_write = TRUE)`

.

*We define 2 normal distributions using the function set_prior, one with a mean of 200 and a standard deviation of 1.5, and another with a mean of 210 and a standard deviation of 1.5. We specify under ‘coef’ which parameters in the model that we later formulate these priors belong to.*

`priors <- c(set_prior("normal(200, 5)", coef = "rasseN"), set_prior("normal(210, 5)", coef = "rasseK"))`

*Plot them, to get a feeling of the priors:*

`curve(dnorm(x, 200, 5), from = 170, to = 230, xlab="egg production", ylab="density")`

**2. Collecting new data**

Now we report our own observations of the egg-laying performance of the 3 Kraienköppe and 6 Niederrheiners from the previous year:

`rasse <- c("K", "K", "K", "N", "N", "N", "N", "N", "N")`

`ll <- c(225, 222, 221, 219, 219, 216, 221, 218, 217)`

`daten <- data.frame(rasse=rasse, ll=ll)`

**3. Combining the prior distribution with the data to obtain the posterior distribution**

To estimate the posterior distribution, we use the ‘brm’ function of the `brms`

package. As we know from other evaluations, we first formulate our model, namely that the egg-laying performance `ll`

depends on the breed. The `-1`

in the model formula causes the model to estimate the mean egg-laying performance for both breeds rather than just the mean for the first breed and the difference to it. Under `data`

, we enter our own collected data and under `prior`

, we define the above priors. The `silent`

argument determines how much information is output to the console when the MCMC chains start running. Depending on the complexity of the model, these simulations can take several seconds or minutes.

*Make the model*

`legeleistung <- brms::brm(formula = ll ~ rasse -1, data = daten , prior = priors, silent = 2)`

**4. Interpretation**

The above code has performed the MCMC simulation and we can now examine the posterior distributions for the laying performance of the chicken breeds:

*Print the summary* `summary(legeleistung)`

The summary first shows us our inputs and some information about the Markov chains. Most interesting to us are, of course, the estimates for rasseK and rasseN and their credibility intervals. As you can see, the confidence intervals overlap: the lower value of rasseK (l-95%) is about 218, which is smaller than the upper value (u-95%) of about 219 for rasseN, which lays fewer eggs.

Sigma is the estimated standard deviation of the assumed normal distribution and is less relevant to us here.

`plot(legeleistung)`

In the figure, the results are even easier to interpret. On the left, you can see the a-posteriori distributions for both breeds (note that the x-axes of the top two distributions are not aligned). To the right of that are the dynamics of the Markov chains.

From the a-posteriori distributions, we could now also calculate the exact probability that the egg-laying performance really is different. However, since we have already seen that the credibility intervals overlap, we stick to our preference for the Niederrheiners, even though it is not ruled out that they lay a few eggs less.

Frequentist | Bayesian |
---|---|

Probability of data, in relation to hypothesis | only yes/no answer |

Confidence interval: in which interval do we expect 95% of the means of further random samples of the same sample size | Credibility interval: in which range of values does the true population mean lie with a probability of 95% |

Assumption that there is one ‘true’ value, but observations lead to a distribution |
Assumption that the observed values are true and have an intrinsic variance |

The easier interpretability of Bayesian analyses is best illustrated by the following figure:

An uncritical frequentist approach would result in: “The effect in group/treatment A and E is significant, so we will use the tested fertilizer for those groups. There seems to be no effect in the other groups.”

However, there is an important effect in group B as well, and even though the effect in group E is statistically significant, the effect size on yield is so small that it probably does not make economic sense to fertilize additionally.

The results of the Bayesian analysis (i.e., the probabilities for the hypothesis that there is an effect) much more directly reflect these interpretations.

Welcome to **Decision Analysis and Forecasting for Agricultural Development**. Feel free to bring up any questions or concerns in the Slack or to Dr. Cory Whitney or the course tutor.

Please watch the video and answer the multiple choice questions below.

Summarize for yourself the difference between the frequentist and the Bayesian approach to knowledge generation. Which one do you feel more comfortable about? Are you ready to become a Bayesian?

- Bertsch McGrayne (2011) `The theory that would not die’ (Chapter 1. Causes in the Air)

Welcome to lecture 9 of **Decision Analysis and Forecasting for Agricultural Development**. Feel free to bring up any questions or concerns in the Slack or to Dr. Cory Whitney or the course tutor.

The following videos tell you something about a method which is frequently used in our group to get reliable values for the input tables from our stakeholders. Please watch the videos and answer the questions that follow.

**Chapter 1.**Kahneman, Daniel, and Patrick Egan. Thinking, Fast and Slow. Vol. 1. New York: Farrar, Straus and Giroux, 2011.

Watch the video and do the awareness test!

Have a look at Terry Heick’s Cognitive Bias codex.

Welcome to lecture 10 of **Decision Analysis and Forecasting for Agricultural Development**. Feel free to bring up any questions or concerns in the Slack or to Dr. Cory Whitney or the course tutor.

Those who work in Decision Analysis must be able to do a number of diverse jobs from the facilitation and integration of ideas into models and also in programming these models. In the following short lecture you will learn about the skills that are particularly important for *integration* of knowledge, *facilitation* of knowledge gathering processes and for *programming* decision models. Please watch the video and answer the multiple choice questions below.

This week we will read and discuss Chapter 1 of ‘The Flaw of Averages’ by Savage and Markowitz (2009).

-Savage, Sam L., and Harry M. Markowitz. The Flaw of Averages: Why We Underestimate Risk in the Face of Uncertainty. John Wiley & Sons, 2009.

Sam Savage’s article The Flaw of Averages in the Harvard Business Review.

Welcome to **Decision Analysis and Forecasting for Agricultural Development**. Feel free to bring up any questions or concerns in the Slack or to Dr. Cory Whitney or the course tutor.

Remember that this is an experiential course and will be based largely on work from your side that is supported by us, rather than passive learning where we deliver content. In order to work with us in this course you will need to have some basic tools.

The first step in getting ready for the work of this course is to install and run R (R Core Team 2021) and RStudio.

If you already have R and RStudio loaded then you should be able to follow parts our R tutorial. Please follow Data Wrangling Part one and Data Wrangling Part two portions of the tutorial. Share the results of your work with the ‘ggplot2’ ‘diamonds’ data (Wickham et al. 2021) after you have worked through these materials.

- Check out the R Studio Tidyverse style guide

You should already have installed R and RStudio. If not, please check out the tools we use in this course. If you already have R and RStudio loaded then you should be able to follow this Data Visualization section of the R tutorial. Share the results of your work with a data set of your choice (i.e. `ggplot2`

`diamonds`

data (Wickham et al. 2021)) after you have worked through these materials.

If you run the code `usethis::use_blank_slate()`

in your console it will set your RStudio preference to **NEVER** save or restore `.RData`

on exit or on startup, which is a lifestyle choice endorsed by many excellent researchers like Jenny Bryan.

Participants shared conceptual models and recieved feedback.

See Slack

RMarkdown: overview

Now let’s check out Rmarkdown, a powerful tool that allows making fancy reports, websites etc. out of our R code (Allaire et al. (2021)). You’re already looking at an example, by the way. This website was produced with Rmarkdown (and it wasn’t hard at all)!

Here are the slides and the html file generated in the video.

To run this you will need to load `library(rmarkdown)`

and library(`knitr`

)

- Use RMarkdown!
- You will thank yourself
- Save often with new version (or use git)

- Use a ‘?’ with function, package, data name in the R console
- If you get stuck, use Google: just add “R” to a search e.g. a copy of an error message
- Someone else has also been confused by it and has written about it
- Thousands of talented programmers scan the web and answer these problems

https://rmarkdown.rstudio.com/authoring_basics.html

https://bookdown.org/yihui/rmarkdown/r-code.html

- Open RStudio

- Open an Rmarkdown file and knit.

Now we’re equipped with all the basic tools for this course. We’ll start using them pretty soon. Throughout the course you’ll find occasional references to your RMarkdown file. This is supposed to be produced with Rmarkdown, with subsequent versions of it stored on Github.

Welcome to the fourth seminar of **Decision Analysis and Forecasting for Agricultural Development**. All your programming and project work for this course should take place in the git environment and this seminar is intended to show you how to make that happen. Feel free to bring up any questions or concerns in the Slack or to Dr. Cory Whitney or the course tutor.

Now we look at the programming version control environment git and the interface github, which we use to share our scripts and changes to our work.

Install Git & join Github (if you have not already):

- install Git https://git-scm.com/downloads

**Important note for Windows users**: when you download Git, it may be installed into your `Program Files`

directory by default. This often causes issues later. The general recommendation is to choose a different directory for installing, i.e. `C:/Users/username/bin/`

. Once Git is installed, open RStudio, and go to `Tools > Global Options`

, select `Git/SVN`

, and then enter or select the path for your Git executable.

**join Github**https://github.com/

Some useful tips on getting these running from our friends

**University of Zurich**http://www.geo.uzh.ch/microsite/reproducible_research/post/rr-rstudio-git/**Nathan Stephens**, a very helpful R blogger

https://support.rstudio.com/hc/en-us/articles/200532077-Version-Control-with-Git-and-SVN**R For the Rest of us**Learn the essentials of using Git and GitHub with RStudio. This guide explains how to integrate version control into your R projects for better collaboration. https://rfortherestofus.com/2021/02/how-to-use-git-github-with-r/

Start simple with your own repository only and work on it alone. Share your profile and link to a repository you made in the Slack channel for this seminar. Soon you will start to collaborate on something in git. For the project part of this course your team will work in a collective repository and work together to generate the final working project model and report.

We can use the options from `htmlpreview.github.io`

to show the results of any of our html output work that is stored in our git repository.

To do that just preface the path for the html file with `http://htmlpreview.github.io/?`

.

For example the slides for this seminar can be found in the `CWWhitney/teaching_git`

repository.

The full path is: `https://github.com/CWWhitney/teaching_git/blob/master/R_Git_GitHub.html`

and the link to this to be viewed directly as a web page is: http://htmlpreview.github.io/?https://github.com/CWWhitney/teaching_git/blob/master/R_Git_GitHub.html

Best practices for using Git from deepsource.io. Note that we break rule number five when we share the html files in the Git repository. That can be our secret.

Watch the ‘What is GitHub’ video from the GitHub developers

Welcome to the model programming seminar of **Decision Analysis and Forecasting for Agricultural Development**. In this seminar we will look into different options for model programming in the `R`

programming environment (R Core Team 2021). In case you have not already installed `R`

and `RStudio`

you may want to go back to the Using R and Rstudio seminar and follow the instructions there. Feel free to bring up any questions or concerns in the Slack or to Dr. Cory Whitney or the course tutor.

Remember to start simple then move on to other steps, make sure that you always have a working model at each of the steps in the process, i.e. starting with a skateboard rather than a car part as in this example from MetaLab.

Follow this hypothetical example of a silvopastoral system as your first Monte Carlo simulation in base R.

In this video we walk you through the process of running this model. Please follow along and write your own R script. If you ask nicely we might also share the R script.

Welcome to the second seminar on model programming **Decision Analysis and Forecasting for Agricultural Development**. In this seminar we will look into different options for model programming in the `R`

programming environment (R Core Team 2021). In case you have not already installed `R`

and `RStudio`

you may want to go back to the Using R and Rstudio seminar and follow the instructions there. Feel free to bring up any questions or concerns in the Slack or to Dr. Cory Whitney or the course tutor.

As we have discussed in previous lectures, when deciding on intervention in complex systems, many of the variables decision makers need to consider are difficult or impossible to precisely quantify. The `decisionSupport`

package (Luedeling et al. 2021) can make use of uncertainty around these relationships and implement a Monte Carlo simulation, which generates a large number of plausible system outcomes. The model procedure draws random numbers for each input variable, which are drawn from specified probability distributions. The method requires two inputs to the `mcSimulation`

function:

an R function that predicts decision outcomes based on the variables named in a separate data table. This R function is customized by the user to address a particular decision problem.

an input table (normally in

`comma separated values`

format, ending in`.csv`

) specifying the names and probability distributions for all variables used in the decision model. These distributions aim to represent the full range of possible values for each component of the model.

The `mcSimulation`

function from the `decisionSupport`

package conducts a Monte Carlo analysis with repeated model runs based on probability distributions for all uncertain variables. The data table and model are customized to fit the particulars of a specific decision.

We have already built some simple models in the Decision Models lecture and the Model programming seminar.

First we generate a model as a function. We use the `decisionSupport`

functions `vv()`

to produce time series with variation from a pre-defined mean and coefficient of variation, `chance_event()`

to simulate whether events occur and `discount()`

to discount values along a time series.

We have just learned about the processes and methods of generating Decision Models. Now we can explore some of the packages that we commonly use in `R`

. The main package that our team uses for simulations is the `decisionSupport`

. In order to use the standard tools for running these models in the R environment you will need to load the `decisionSupport`

library. Use the `install.packages`

function, i.e. `install.packages("decisionSupport")`

.

Here we generate an input table to feed the model function. Update this to include the management cost variable in the graphical impact pathway above. Call your new variable `"Management_cost"`

, make the lower bound `100`

and the upper bound `2000`

, make the distribution `"posnorm"`

, make the label `"Management cost (USD/ha)"`

and make the description `"Management costs in a normal season"`

.

```
input_estimates <- data.frame(variable = c("Yield", "Market_price", "Labor_cost"),
lower = c(6000, 3, 500),
median = NA,
upper = c(14000, 8, 1000),
distribution = c("posnorm", "posnorm", "posnorm"),
label = c("Yield (kg/ha)", "Price (USD/kg)", "Labor cost (USD/ha)"),
Description = c("Yield in a sweet cherry farm under normal conditions",
"Price of sweet cherry in a normal season",
"Labor costs in a normal season"))
input_estimates
```

```
input_estimates <- data.frame(variable = c("Yield", "Market_price", "Labor_cost", "Management_cost"),
lower = c(6000, 3, 500, 100),
median = NA,
upper = c(14000, 8, 1000, 2000),
distribution = c("posnorm", "posnorm", "posnorm", "posnorm"),
label = c("Yield (kg/ha)", "Price (USD/kg)", "Labor cost (USD/ha)", "Management cost (USD/ha)"),
Description = c("Yield in a sweet cherry farm under normal conditions",
"Price of sweet cherry in a normal season",
"Labor costs in a normal season",
"Management costs in a normal season"))
input_estimates
```

Here we use the `mcSimulation`

function from the `decisionSupport`

package to implement a model (Luedeling et al. 2021). The model function that describes the graphical impact pathway. Add a new line of code that summarizes the `Labor_cost`

and `Management_cost`

into `overall_costs`

, then subtract these from the `income`

to calculate `final_result`

.

```
model_function <- function(){
# Estimate the income in a normal season
income <- Yield * Market_price
# Estimate the final results from the model
final_result <- income - Labor_cost
# Generate the list of outputs from the Monte Carlo simulation
return(list(final_result = final_result))
}
# Run the Monte Carlo simulation using the model function
example_mc_simulation <- mcSimulation(estimate = as.estimate(input_estimates),
model_function = model_function,
numberOfModelRuns = 800,
functionSyntax = "plainNames")
example_mc_simulation
```

```
model_function <- function(){
# Estimate the income in a normal season
income <- Yield * Market_price
overall_costs <- Labor_cost + Management_cost
# Estimate the final results from the model
final_result <- income - overall_costs
# Generate the list of outputs from the Monte Carlo simulation
return(list(final_result = final_result))
}
# Run the Monte Carlo simulation using the model function
example_mc_simulation <- mcSimulation(estimate = as.estimate(input_estimates),
model_function = model_function,
numberOfModelRuns = 800,
functionSyntax = "plainNames")
example_mc_simulation
```

Here we show the results of a Monte Carlo simulation (800 model runs) for estimating the profits in sweet cherry orchards.

Change the plot to a histogram by using the `method`

argument in the `plot_distributions`

function.

```
plot_distributions(mcSimulation_object = example_mc_simulation,
vars = "final_result",
method = "boxplot_density",
old_names = "final_result",
new_names = "Outcome distribution for profits")
```

```
plot_distributions(mcSimulation_object = example_mc_simulation,
vars = "final_result",
method = "hist_simple_overlay",
old_names = "final_result",
new_names = "Outcome distribution for profits")
```

`make_variables`

You could simply start further developing the decision model now, but since the model function will be designed to make use of variables provided to it externally (random numbers drawn according to the information in the data table), you will need to define sample values for all variables, if you want to test pieces of the function code during the development process. This can be done manually, but it’s more easily accomplished with the following helper function `make_variables`

:

```
make_variables <- function(est,n=1)
{ x<-random(rho=est, n=n)
for(i in colnames(x)) assign(i,
as.numeric(x[1,i]),envir=.GlobalEnv)
}
```

This function is not included in the `decisionSupport`

package, because it places the desired variables in the global environment. This is not allowed for functions included in packages on R’s download servers.

Applying `make_variables`

and `as.estimate`

to the data table (with default setting `n=1`

) generates one random number for each variable, which then allows you to easily test the code you are developing. Try running this function on your code as you build the decision function. This allows for testing the values within a model rather than running the full model.

Run the `make_variables`

and `as.estimate`

on the `input_estimates`

input table that we created and then calculate the result of `Labor_cost + Management_cost`

given a single random value for these variables. **Note that each time you run this code it generates a new random draw and produces a different number from within the range for the variables in the input table.**

```
make_variables(as.estimate(input_estimates))
Market_price
```

```
make_variables(as.estimate(input_estimates))
Labor_cost + Management_cost
```

Once you have followed and run the code above on your machine it is a good time to look through the outline of these procedures in the `decisionSupport`

vignette on the CRAN called ‘Applying the mcSimulation function in decisionSupport’ (Fernandez, Whitney, and Luedeling 2021). This will show you how the tools can be applied for comparing decision outcomes. It runs a Monte-Carlo-based selection of sedimentation management strategies for a reservoir in the Upper Volta River Basin of Burkina Faso (Lanzanova et al. 2019). Tip: If you want to play with this code you can find the Rmarkdown file and the estimate table in the `decisionSupport`

GitHub repository.

Taken together, all these outputs allow an evaluation of the plausible range of net benefits that can be expected to arise from the decision. They provide a recommendation on which decision option should be preferred, and an appraisal of which input variables are responsible for most of the variation in the output distribution.

Listen to this interview on Alex Andorra’s podcast ‘Learning Bayesian Statistics’, How to use Bayes in industry, with Colin Carroll

Before you start developing the decision model (an R function), open R and download and load the `decisionSupport`

package.

The `mcSimulation`

function from the `decisionSupport`

package can be applied to conduct decision analysis (Luedeling et al. 2021). The function requires three inputs:

- an
`estimate`

of the joint probability distribution of the input variables. These specify the names and probability distributions for all variables used in the decision model. These distributions aim to represent the full range of possible values for each component of the model. - a
`model_function`

that predicts decision outcomes based on the variables named in a separate data table. This R function is customized by the user to address a particular decision problem to provide the decision analysis model. `numberOfModelRuns`

indicating the number of times to run the model function.

These inputs are provided as arguments to the `mcSimulation`

function, which conducts a Monte Carlo analysis with repeated model runs based on probability distributions for all uncertain variables. The data table and model are customized to fit the particulars of a specific decision.

`estimate`

To support the model building process we design an input table to store the `estimate`

values. The table is stored locally as `example_decision_inputs.csv`

and contains many of the basic values for the analysis. This table contains all the input variables used in the model. Their distributions are described by 90% confidence intervals, which are specified by lower (5% quantile) and upper (95% quantile) bounds, as well as the shape of the distribution. This example uses four different distributions:

`const`

– a constant value`norm`

– a normal distribution`tnorm_0_1`

– a truncated normal distribution that can only have values between 0 and 1 (useful for probabilities; note that 0 and 1, as well as numbers outside this interval are not permitted as inputs)`posnorm`

– a normal distribution truncated at 0 (only positive values allowed)

For a full list of possible distributions, type `?random.estimate1d`

in your R console. When specifying confidence intervals for truncated distributions, note that approximately 5% of the random values should ‘fit’ within the truncation interval on either side. If there is not enough space, the function will generate a warning (usually it will still work, but the inputs may not look like you intended them to).

We have provided default distributions for all the variables used here, but feel free to make adjustments by editing the .csv file in a spreadsheet program. You can download the ‘example_decision_inputs.csv’ here and save it as a .csv file locally (from your web browser try: File > Save page as). In this example it is stored in the ‘data’ folder. Once you have downloaded the data you can run `example_decision_inputs <- read.csv("data/example_decision_inputs.csv")`

in your console to make the following code work on your machine. Note that the input table that can be written or read from a .csv file and calculated with the `estimate_read_csv`

function or converted to the correct format with the `as.estimate`

function.

`model_function`

The decision model is coded as an R function which takes in the variables provided in the data table and generates a model output, such as the Net Present Value.

Here we define a simple model function that we call `example_decision_model`

. It calculates profits as benefits minus costs and arbitrarily adds 500 to the result to arrive at `final_profits`

. This simple example shows us how to use `function`

, the results of which can be applied elsewhere. The most basic way to apply the library is to use the `mcSimulation`

function to run our `example_decision_model`

. Here we run it 100 times using inputs from the `example_decision_inputs.csv`

table.

Update this model by adding `additional_benefits`

, one of the variables from the `example_decision_inputs`

data, to replace the 500 that was arbitrarily added to the profit and changing the number of model runs to 700.

```
example_decision_model <- function(x, varnames){
profit <- benefits-costs
final_profits <- profit + 500
return(final_profits)
}
mcSimulation(estimate = as.estimate(example_decision_inputs),
model_function = example_decision_model,
numberOfModelRuns = 100,
functionSyntax = "plainNames")
```

```
example_decision_model <- function(x, varnames){
profit <- benefits-costs
final_profits <- profit + additional_benefits
return(final_profits)
}
mcSimulation(estimate = as.estimate(example_decision_inputs),
model_function = example_decision_model,
numberOfModelRuns = 700,
functionSyntax = "plainNames")
```

Now we have a simulation of possible outcomes. In the model programming seminar this week we will go into more detail on the options for assessment of the results of simulations and on visualization of the results.

Note that this example was constructed for clarity and not for speed. Speed considerations are not very important when we only run a process once, but since in the Monte Carlo simulation all delays are multiplied by `numberOfModelRuns`

(e.g. 10,000), they can sometimes add up to substantial time losses. Even with highly efficient coding, Monte Carlo simulations can take a while when dealing with complex decisions.

The objective of the procedures used in the `decisionSupport`

package is to make it easier for analysts to produce decision-relevant information that adequately reflects the imperfect nature of the information we usually have. Adding probabilistic elements to a simulation adds substantial value to an analysis. Mostly, it avoids making spurious assumptions, replacing uncertainty with ‘best bets’ and producing results that do not reflect the knowledge limitations that make decision-making so challenging. More information on all this is contained in the decisionSupport manual, especially under `welfareDecisionAnalysis`

.

See the model section of the talk by Hoa Do in the Lecture 8 on forecasts (from around 4:40-6 minutes in) for an example of how to present a model overview in succinct and clear way. The model is presented in the first three minutes of the talk.

Welcome to the 8th seminar of **Decision Analysis and Forecasting for Agricultural Development**.Feel free to bring up any questions or concerns in the Slack or to Dr. Cory Whitney or the course tutor.

In this seminar we will go over some of the functions that are useful for building parts of our models in the `decisionSupport`

package.

Many variables vary over time and it may not be desirable to ignore this variation in time series analyses. The `vv`

function produces time series that contain variation from a specified mean and a desired coefficient of variation. A trend can be added to this time series.

The arguments for the `vv`

function include:

`var_mean`

, which is the mean of the variable to be varied`var_CV`

, which is the desired coefficient of variation (in percent)`n`

, which is the integer; number of values to produce`distribution`

, which is the probability distribution for the introducing variation. This is currently only implemented for normal distributions`absolute_trend`

, which is the absolute increment in the`var_mean`

in each time step. Defaults to`NA`

, which means no such absolute value trend is present. If both absolute and relative trends are specified, only original means are used`relative_trend`

, which is the relative trend in the`var_mean`

in each time step (in percent). Defaults to`NA`

, which means no such relative value trend is present. If both absolute and relative trends are specified, only original means are used`lower_limit`

, which is the lowest possible value for elements of the resulting vector`upper_limit`

, which is the upper possible value for elements of the resulting vector

Note that only one type of trend can be specified. If neither of the trend parameters are NA, the function uses only the original means

The function produces a vector of `n`

numeric values, representing a variable time series, which initially has the mean `var_mean`

, and then increases according to the specified trends. Variation is determined by the given coefficient of variation `var_CV`

.

Create a vector with the `vv`

function and plot with the base R `plot`

function.

```
# reduce var_CV to 5 and change n to 40. Name the output 'valvar' and plot with base R
vv(var_mean = 100,
var_CV = 10,
n = 30)
```

```
valvar <- vv(var_mean = 100,
var_CV = 5,
n = 40)
plot(valvar)
```

Use the `absolute_trend`

argument and plot with the base R `plot`

function. The absolute trend is a raw number added each year, i.e. we add `5`

to each time step in the following example.

```
# reduce var_mean to 50 and make absolute_trend 10. Name the output 'valvar' and plot with base R
vv(var_mean = 100,
var_CV = 10,
n = 30,
absolute_trend = 5)
```

```
valvar <- vv(var_mean = 50,
var_CV = 10,
n = 30,
absolute_trend = 10)
plot(valvar)
```

Use the `relative_trend`

argument and plot with the base R `plot`

function.

```
# reduce var_CV to 5 and change n to 40. Name the output 'valvar' and plot with base R
vv(var_mean = 100,
var_CV = 10,
n = 30,
relative_trend = 5)
```

```
valvar <- vv(var_mean = 100,
var_CV = 5,
n = 40,
relative_trend = 5)
plot(valvar)
```

In many simulations, certain events can either occur or not, and values for dependent variables can depend on which of the cases occurs. This function randomly simulates whether events occur and returns output values accordingly. The outputs can be single values or series of values, with the option of introducing artificial variation into this dataset.

The arguments for the `chance_event`

function include:

`chance`

, which is the probability that the risky event will occur (between 0 and 1)`value_if`

, which is the output value in case the event occurs. This can be either a single numeric value or a numeric vector. Defaults to 1.`value_if_not`

, which is the output value in case the event does not occur. This can be either a single numeric value or a numeric vector. If it is a vector, it must have the same length as`value_if`

`n`

, which is the number of times the risky event is simulated. This is ignored if`length(value_if)>1`

.`CV_if`

, which is the coefficient of variation for introducing randomness into the value_if data set. This defaults to 0 for no artificial variation. See documentation for the`vv`

function for details.`CV_if_not`

, which is the coefficient of variation for introducing randomness into the value_if_not data set. This defaults to the value for`CV_if.`

See documentation for the`vv`

function for details.`one_draw`

, which is the boolean coefficient indicating if event occurrence is determined only once (`TRUE`

) with results applying to all elements of the results vector, or if event occurrence is determined independently for each element (`FALSE`

is the default).

The `chance_event`

function provides a numeric vector of length `n`

, containing outputs of a probabilistic simulation that assigns `value_if`

if the event occurs, or `value_if_not`

if is does not occur (both optionally with artificial variation).

```
# decrease the chance and value_if by half and repeat 20 times. Name the output 'chancevar' and plot with base R
chance_event(chance = 0.5,
value_if = 6,
n = 10)
```

```
chancevar <- chance_event(chance = 0.25,
value_if = 3,
n = 20)
plot(chancevar)
```

Use the `value_if_not`

and `CV_if`

arguments.

```
# make the chance 10 percent, the value_if 5 and the value_if_not 20, repeat 100 times and reduce the coefficient of variation by half. Name the output 'chancevar' and plot with base R.
chance_event(chance = 0.5,
value_if = 1,
value_if_not = 5,
n = 10,
CV_if = 20)
```

```
chancevar <- chance_event(chance = 0.1,
value_if = 5,
value_if_not = 20,
n = 100,
CV_if = 10)
plot(chancevar)
```

Yields of trees or other perennial plants have to be simulated in order to predict the outcomes of many interventions. Unlike annual crops, however, trees normally yield nothing for a few years after planting, following which yields gradually increase until they reach a tree-specific maximum. This is simulated with this function, which assumes that a Gompertz function is a good way to describe this (based on the general shape of the curve, not on extensive research…). The function assumes that yields remain at the maximum level, once this is reached. For long simulations, this may not be a valid assumption! The function parameters are estimated based on yield estimates for two points in time, which the user can specify. They are described by a year number and by a percentage of the maximum yield that is attained at that time.

The arguments for the `gompertz_yield`

function include:

`max_harvest`

, which is the maximum harvest from the tree (in number of fruits, kg or other units)`time_to_first_yield_estimate`

, which is the year (or other time unit) number, for which the first yield estimate is provided by`first_yield_estimate_percent`

`time_to_second_yield_estimate`

, which is the year (or other time unit) number, for which the second yield estimate is provided by`second_yield_estimate_percent`

`first_yield_estimate_percent`

percentage of the maximum yield that is attained in the year (or other time unit) given by`time_to_first_yield_estimate`

`second_yield_estimate_percent`

percentage of the maximum yield that is attained in the year (or other time unit) given by`time_to_second_yield_estimate`

`n_years`

, which is the number of years to run the simulation`var_CV`

, which is the coefficient indicating how much variation should be introduced into the time series. If this is one numeric value, then this value is used for all variables. The default is 0, for a time series with no artificially introduced variation. See description of the`vv`

function for more details on this.`no_yield_before_first_estimate`

, which is the boolean variable indicating whether yields before the time unit indicated by`time_to_first_yield_estimate`

should be 0.

The function provides a vector of `n_years`

numeric values, useful for describing the simulated yield of perennial crops. This starts at 0 and, if the simulation runs for a sufficient number of years, approaches `max_harvest.`

If `var_CV>0`

, this time series includes artificial variation.

```
# create a vector where the maximum harvest is 500, which is achieved in 10 years (i.e. 100% by the second yield estimate)
gompertz_yield(max_harvest = 1000,
time_to_first_yield_estimate = 5,
first_yield_estimate_percent = 10,
time_to_second_yield_estimate = 15,
second_yield_estimate_percent = 90,
n_years = 30)
```

```
gompertz_yield(max_harvest = 500,
time_to_first_yield_estimate = 5,
first_yield_estimate_percent = 10,
time_to_second_yield_estimate = 10,
second_yield_estimate_percent = 100,
n_years = 30)
```

In the seminar you will provide the updated version of your decision model.

To add correlation to your model visit the Adding correlations to Decision Analysis models vignette.

Welcome to the Value of Information seminar for the course **Decision Analysis and Forecasting for Agricultural Development**.Feel free to bring up any questions or concerns in the Slack or to Dr. Cory Whitney or the course tutor.

Since the model inputs are uncertain variables, there is always a chance that the decision turns out to be wrong. Gathering new data to inform the decision, will lead to (on average) better decisions. But how much better? Is it worth the measuring effort? What if we not only *measure* but even *manipulate* the model inputs? These concepts are informally known as “value of clairvoyance” and “value of wizardry”. Please watch the following lecture by Johannes Kopton and enter the magical world of decision analysis:

Technically we assess the possibility of making the wrong decision with a payoff matrix \(Rij\) with the row index \(i\) describing a choice and the column index \(j\) describing a random variable that the decision maker does not yet have knowledge of, that has probability \(pj\) of being in state \(j\). If the decision maker has to choose \(i\) without knowing the value of \(j\), the best choice is the one that maximizes the expected value:

\({\mbox{EMV}}=\max _{i}\sum _{j}p_{j}R_{ij}\)

where \(\sum _{j}p_{j}R_{ij}\) is the expected payoff for action \(i\) i.e. the expectation value, and \({\mbox{EMV}}=\max _{i}\) is choosing the maximum of these expectations for all available actions.

However, with perfect knowledge of \(j\), the decision maker would choose a value of \(i\) that optimizes the expectation for that specific \(j\). Therefore, the expected value given perfect information is

\({\mbox{EV}}|{\mbox{PI}}=\sum _{j}p_{j}(\max _{i}R_{ij})\)

where \(p_{j}\) is the probability that the system is in state \(j\), and \(R_{ij}\) is the pay-off if one follows action \(i\) while the system is in state \(j\). Here \((\max_{i}R_{ij})\) indicates the best choice of action \(i\) for each state \(j\).

The expected value of perfect information is the difference between these two quantities,

\({\mbox{EVPI}}={\mbox{EV}}|{\mbox{PI}}-{\mbox{EMV}}\)

This difference describes, in expectation, how much larger a value the decision maker can hope to obtain by knowing \(j\) and picking the best \(i\) for that \(j\), as compared to picking a value of \(i\) before \(j\) is known. Since \(EV|PI\) is necessarily greater than or equal to \(EMV\), \(EVPI\) is always non-negative.

Projection to Latent Structures (PLS), also sometimes known as Partial Least Squares regression is a multivariate statistical technique that can deal with multiple collinear dependent and independent variables (Wold, Sjöström, and Eriksson 2001). It can be used as another means to assess the outcomes of a Monte Carlo model. Read more in ‘A Simple Explanation of Partial Least Squares’ by Kee Siong Ng.

Variable Importance in Projection (VIP) scores estimate the importance of each variable in the projection used in a PLS mode. VIP is a parameter used for calculating the cumulative measure of the influence of individual \(X\)-variables on the model. For a given PLS dimension, \(a\), the squared PLS weight \((W_a^2\) of that term is multiplied by the explained sum of squares (\(SS\)) of that \(PLS\) dimension; and the value obtained is then divided by the total explained \(SS\) by the PLS model and multiplied by the number of terms in the model. The final \(VIP\) is the square root of that number.

\(VIP_{PLS} = K\times (\frac{[\sum_{a=1}^{A}(W_{a}^{2} \times SSY_{comp,a})]}{SSY_{cum}})\)

Technically, VIP is a weighted combination overall components of the squared PLS weights (\(Wa\)), where \(SSY_{comp,a}\) is the sum of squares of \(Y\) explained by component \(a\), \(A\) is the total number of components, and \(K\) is the total number of variables. The average VIP is equal to 1 because the \(SS\) of all VIP values is equal to the number of variables in \(X\). A variable with a VIP Score close to or greater than 1 (one) can be considered important in given model. The input is a PLS model and the output is a set of column vectors equal in length to the number of variables included in the model. See Galindo-Prieto, Eriksson, and Trygg (2014) for a detailed description of variations of VIP analysis.

In the seminar we will go into detail abut how to calculate PLS and the VIP with the built-in functions of the `decisionSupport`

package.

Welcome to the second part of the Value of Information seminar for the course **Decision Analysis and Forecasting for Agricultural Development**.Feel free to bring up any questions or concerns in the Slack or to Dr. Cory Whitney or the course tutor.

Let’s use a modified version of sheep apple agroforestry from Seminar 7 (#model_programming_cont). Just for this class let’s presume that the farmer has a grassland where they graze sheep. Intervention: implement apple agroforestry. Let’s prepare our input estimates for the system.

```
library(ggplot2)
library(decisionSupport)
```

```
input_estimates <- data.frame(
variable = c(
"sheep_income",
"sheep_cost",
"apple_income",
"apple_cost",
"discount_rate"
),
lower = c(3000, 1000, 30000, 15000, 10),
median = NA,
upper = c(5000, 2500, 60000, 30000, 10),
distribution = c("posnorm", "posnorm", "posnorm", "posnorm", "const"),
label = c(
"Income from sheep (euro/year)",
"Cost of sheep (euro/year)",
"Income from apple (euro/year)",
"Cost of apple (euro/year)",
"Discount Rate"
)
)
# show the result
input_estimates
```

```
input_estimates <- data.frame(
variable = c(
"sheep_income",
"sheep_cost",
"apple_income",
"apple_cost",
"discount_rate"
),
lower = c(3000, 1000, 30000, 15000, 10),
median = NA,
upper = c(5000, 2500, 60000, 30000, 10),
distribution = c("posnorm", "posnorm", "posnorm", "posnorm", "const"),
label = c(
"Income from sheep (euro/year)",
"Cost of sheep (euro/year)",
"Income from apple (euro/year)",
"Cost of apple (euro/year)",
"Discount Rate"
)
)
# show the result
input_estimates
```

If you want you can write the input estimates into a local file on your machine.

```
# write the results to a .csv file
df <- data.frame(input_estimates)
write.csv(df, file = "input_table.csv", row.names = FALSE)
```

In the next step, let’s build a function to model the system and calculate the NPV of the AF system vs only sheep. Simulate it using the `mcSimulation`

function from the `decisionSupport`

package with 5000 ‘numberOfModelRuns’. Using ‘plot_distributions’ plot NPV of intervention trade off vs no intervention.

```
model_function <- function() {
# Estimate the income in a normal season
AF_income <- sheep_income + apple_income
AF_cost <- sheep_cost + apple_cost
# Estimate the final results from the model
AF_final_result <- AF_income - AF_cost
# baseline with sheep only
sheep_only <- sheep_income - sheep_cost
# should I plant trees in the sheep pastures?
Decision_benefit <- AF_final_result - sheep_only
#Calculating NPV
#AF System
AF_NPV <- discount(AF_final_result,
discount_rate = discount_rate,
calculate_NPV = TRUE)#NVP of AF system
NPV_sheep_only <- discount(sheep_only,
discount_rate = discount_rate,
calculate_NPV = TRUE) #NVP of grassland
NPV_decision <- discount(Decision_benefit,
discount_rate = discount_rate,
calculate_NPV = TRUE)
# Generate the list of outputs from the Monte Carlo simulation
return(
list(
NPV_Agroforestry_System = AF_NPV,
NPV_Treeless_System = NPV_sheep_only,
NPV_decision = NPV_decision
)
)
}
```

```
model_function <- function() {
# Estimate the income in a normal season
AF_income <- sheep_income + apple_income
AF_cost <- sheep_cost + apple_cost
# Estimate the final results from the model
AF_final_result <- AF_income - AF_cost
sheep_only <- sheep_income - sheep_cost
Decision_benefit <- AF_final_result - sheep_only
#Calculating NPV
#AF System
AF_NPV <- discount(AF_final_result,
discount_rate = discount_rate,
calculate_NPV = TRUE)#NVP of AF system
NPV_sheep_only <- discount(sheep_only,
discount_rate = discount_rate,
calculate_NPV = TRUE) #NVP of grassland
NPV_decision <- discount(Decision_benefit,
discount_rate = discount_rate,
calculate_NPV = TRUE)
# Generate the list of outputs from the Monte Carlo simulation
return(
list(
NPV_Agroforestry_System = AF_NPV,
NPV_Treeless_System = NPV_sheep_only,
NPV_decision = NPV_decision
)
)
}
apple_sheep_mc_simulation <- mcSimulation(
estimate = as.estimate(input_estimates),
model_function = model_function,
numberOfModelRuns = 80,
functionSyntax = "plainNames"
)
plot_distributions(
mcSimulation_object = apple_sheep_mc_simulation,
vars = c("NPV_decision", "NPV_Treeless_System"),
method = 'smooth_simple_overlay',
base_size = 7,
x_axis_name = "Outcome of AF intervention",
scale_x_continuous(
labels = function(x)
x / 100000
),
ggtitle("Net Present Value of the system"),
legend.position = "bottom"
)
```

Now, let’s perform sensitivity analysis using Partial Least Squares (PLS) regression using ‘plsr’ function. Plot the results showing a `cut_off_line`

at 1 and `threshold`

of 0.5.

```
pls_result_AF <- plsr.mcSimulation(
object = apple_sheep_mc_simulation,
resultName = names(apple_sheep_mc_simulation$y)[3],
ncomp = 1
)
```

```
pls_result_AF <- plsr.mcSimulation(
object = apple_sheep_mc_simulation,
resultName = names(apple_sheep_mc_simulation$y)[3],
ncomp = 1
)
plot_pls(pls_result_AF,
input_table = input_estimates,
cut_off_line = 1,
threshold = 0.5)
```

Save it on your machine with `ggsave`

.

```
plot_pls(pls_result_AF,
input_table = input_estimates,
cut_off_line = 1,
threshold = 0.5)
ggsave(filename = "PLS_sheep_vs_AppleAF.png",
plot = last_plot(),
width = 5,
height = 3)
```

Now, let’s quantify Value of Information by calculating expected value of perfect information (EVPI). Save the results of your MC simulation to a data frame.

```
# assign the results of x and y to a data frame
df <- data.frame(apple_sheep_mc_simulation$x, apple_sheep_mc_simulation$y[1:3])
```

```
# assign the results of x and y to a data frame
df <- data.frame(apple_sheep_mc_simulation$x, apple_sheep_mc_simulation$y[1:3])
```

Generate the Variable Importance in the Projection (VIP) scores of the PLS for the input variables based on the apple sheep model.

```
# run evpi on the NPVs (the last few in the table, starting with "NPV_decision")
EVPI <- multi_EVPI(mc = df, first_out_var = "NPV_decision")
# plot the EVPI results for the decision
plot_evpi(EVPIresults = EVPI, decision_vars = "NPV_decision")
```

```
# run evpi on the NPVs (the last few in the table, starting with "NPV_decision")
EVPI <- multi_EVPI(mc = df, first_out_var = "NPV_decision")
# plot the EVPI results for the decision
plot_evpi(EVPIresults = EVPI, decision_vars = "NPV_decision")
```

Share your result with us in Slack and explain the result.

Welcome to seminar 10 of **Decision Analysis and Forecasting for Agricultural Development**. Feel free to bring up any questions or concerns in the Slack or to Dr. Cory Whitney or the course tutor.

The results of a Monte Carlo model are not always easy to interpret and to communicate about. One important step in communicating these often very large data sets is with good visualization. In previous lectures and seminars we have covered the basics of plotting the distributions of model results for comparisons between decision options. In this lecture we will build on what we learned in the Decision Models lecture and the Model programming seminar to learn more about how to generate useful plots for communicating model results.

In the Model programming part 2 seminar we generated a model called `example_mc_simulation`

. As a quick refresher change the plot of the results of that model from `hist_simple_overlay`

to `boxplot_density`

by using the `method`

argument in the `plot_distributions`

function.

In this seminar we will build on what we learned in the Decision Models lecture and the Model programming part 1 seminar to learn more about how to generate useful forecasts for communicating model results. We will implement another Monte Carlo simulation.

In the Model programming part 2 seminar we generated a model called `example_mc_simulation`

. As a quick refresher change the plot of the results of that model from `boxplot_density`

to `smooth_simple_overlay`

by using the `method`

argument in the `plot_distributions`

function.

```
plot_distributions(mcSimulation_object = example_mc_simulation,
vars = "final_result",
method = "boxplot_density",
old_names = "final_result",
new_names = "Outcome distribution for profits")
```

```
plot_distributions(mcSimulation_object = example_mc_simulation,
vars = "final_result",
method = "smooth_simple_overlay",
old_names = "final_result",
new_names = "Outcome distribution for profits")
```

First let’s look at a case study on agroforestry interventions in Northwest Vietnam (Do, Luedeling, and Whitney 2020).

Here we build the example_mc_simulation again for use in the following exercises.

```
# input_estimates <- read.csv("data/input_estimates.csv")
model_function <- function(){
income <- Yield * Market_price
overall_costs <- Labor_cost + Management_cost
final_result <- income - overall_costs
return(list(final_result = final_result))
}
example_mc_simulation <- decisionSupport::mcSimulation(estimate = decisionSupport::as.estimate(input_estimates),
model_function = model_function,
numberOfModelRuns = 10000,
functionSyntax = "plainNames")
```

```
plot_distributions(mcSimulation_object = example_mc_simulation,
vars = "final_result",
method = "hist_simple_overlay",
old_names = "final_result",
new_names = "Outcome distribution for profits")
```

```
plot_distributions(mcSimulation_object = example_mc_simulation,
vars = "final_result",
method = "boxplot_density",
old_names = "final_result",
new_names = "Outcome distribution for profits")
```

The `plot_cashflow`

function from the `decisionSupport`

package (Luedeling et al. 2021) creates a cashflow plot of the returned list of related outputs from the `mcSimulation`

function using ggplot2 (Wickham et al. 2021). The function automatically defines quantiles (5 to 95% and 25 to 75%) as well as a value for the median.

```
# Plotting the cashflow:
# Create the estimate object (for multiple options):
variable = c("revenue_option1", "costs_option1", "n_years",
"revenue_option2", "costs_option2")
distribution = c("norm", "norm", "const", "norm", "norm")
lower = c(10000, 5000, 20, 8000, 500)
upper = c(100000, 50000, 20, 80000, 30000)
costBenefitEstimate <- as.estimate(variable, distribution, lower, upper)
# Define the model function without name for the return value:
profit1 <- function(x) {
cashflow_option1 <- vv(revenue_option1 - costs_option1, n = n_years, var_CV = 100)
cashflow_option2 <- vv(revenue_option2 - costs_option2, n = n_years, var_CV = 100)
return(list(Revenues_option1 = revenue_option1,
Revenues_option2 = revenue_option2,
Costs_option1 = costs_option1,
Costs_option2 = costs_option2,
Cashflow_option_one = cashflow_option1,
Cashflow_option_two = cashflow_option2))
}
# Perform the Monte Carlo simulation:
predictionProfit1 <- mcSimulation(estimate = costBenefitEstimate,
model_function = profit1,
numberOfModelRuns = 10000,
functionSyntax = "plainNames")
# Plot the cashflow distribution over time
plot_cashflow(mcSimulation_object = predictionProfit1, cashflow_var_name = "Cashflow_option_one",
x_axis_name = "Years with intervention",
y_axis_name = "Annual cashflow in USD",
color_25_75 = "green4", color_5_95 = "green1",
color_median = "red")
```

Plot the cashflow with panels to compare the cashflow distribution over time for multiple decision options:

```
# Plotting the cashflow:
# Create the estimate object (for multiple options):
variable = c("revenue_option1", "costs_option1", "n_years",
"revenue_option2", "costs_option2")
distribution = c("norm", "norm", "const", "norm", "norm")
lower = c(10000, 5000, 10, 8000, 500)
upper = c(100000, 50000, 10, 80000, 30000)
costBenefitEstimate <- as.estimate(variable, distribution, lower, upper)
# Define the model function without name for the return value:
profit1 <- function(x) {
cashflow_option1 <- vv(revenue_option1 - costs_option1, n = n_years, var_CV = 100)
cashflow_option2 <- vv(revenue_option2 - costs_option2, n = n_years, var_CV = 100)
return(list(Revenues_option1 = revenue_option1,
Revenues_option2 = revenue_option2,
Costs_option1 = costs_option1,
Costs_option2 = costs_option2,
Cashflow_option_one = cashflow_option1,
Cashflow_option_two = cashflow_option2))
}
# Perform the Monte Carlo simulation:
predictionProfit1 <- mcSimulation(estimate = costBenefitEstimate,
model_function = profit1,
numberOfModelRuns = 10000,
functionSyntax = "plainNames")
plot_cashflow(mcSimulation_object = predictionProfit1,
cashflow_var_name = c("Cashflow_option_one", "Cashflow_option_two"),
x_axis_name = "Years with intervention",
y_axis_name = "Annual cashflow in USD",
color_25_75 = "green4", color_5_95 = "green1",
color_median = "red",
facet_labels = c("Option 1", "Option 2"))
```

We can use the `compound_figure`

function to create a simple compound figure of model results and analyses of a binary decision (do or do not do). The figure includes the distribution of the expected outcome, the expected cashflow, as well as the variable importance and the value of information.

```
# Create the estimate object:
cost_benefit_table <- data.frame(label = c("Revenue", "Costs"),
variable = c("revenue", "costs"),
distribution = c("norm", "norm"),
lower = c(100, 500),
median = c(NA, NA),
upper = c(10000, 5000))
# (a) Define the model function without name for the return value:
profit1 <- function() {
Decision <- revenue - costs
cashflow <- rnorm(rep(revenue, 20))
return(list(Revenues = revenue,
Costs = costs,
cashflow = cashflow,
Decision = Decision))
}
compound_figure(model = profit1,
input_table = cost_benefit_table,
decision_var_name = "Decision",
cashflow_var_name = "cashflow",
model_runs = 1e2,
distribution_method = 'smooth_simple_overlay')
```

Here we demonstrate a few more various graphical options to visualize uncertainty intervals of outcomes of Monte Carlo simulations. We create a data set of yield distributions of three different farming practices and use the function `mcmc_intervals()`

from the `bayesplot`

library to plot the data set (Gabry and Mahr 2021).

```
test <- data.frame("practice 1" = rnorm(n = 1000, mean = 8, sd = 1.5),
"practice 2" = rnorm(n = 1000, mean = 7, sd = 1),
"practice 3" = rnorm(n = 1000, mean = 5, sd = 0.5))
color_scheme_set("red")
mcmc_intervals(test,prob = 0.5,prob_outer = 0.9,point_est = "median")
```

Do the same with the with `mcmc_areas()`

function from the `bayesplot`

library (Gabry and Mahr 2021).

```
test <- data.frame("practice 1" = rnorm(n = 1000, mean = 8, sd = 1.5),
"practice 2" = rnorm(n = 1000, mean = 7, sd = 1),
"practice 3" = rnorm(n = 1000, mean = 5, sd = 0.5))
color_scheme_set("blue")
mcmc_areas(test,prob = 0.9,point_est = "median")
```

We can also use `geom_density()`

in `ggplot2`

to compare the spread of different distributions (Wickham et al. 2021):

```
test <- data.frame("practice 1" = rnorm(n = 1000, mean = 8, sd = 1.5),
"practice 2" = rnorm(n = 1000, mean = 7, sd = 1),
"practice 3" = rnorm(n = 1000, mean = 5, sd = 0.5))
stacked_test <- stack(test)
ggplot(stacked_test,
aes(x=values,group=ind,fill=ind )) +
geom_density(colour=NA,alpha=.5) +
ylab("Probability density") +
xlab("Yield")
```

Use `ggplot2`

`geom_histogram()`

function to show the histogram of the data in comparison:

```
test <- data.frame("practice 1" = rnorm(n = 1000, mean = 8, sd = 1.5),
"practice 2" = rnorm(n = 1000, mean = 7, sd = 1),
"practice 3" = rnorm(n = 1000, mean = 5, sd = 0.5))
stacked_test <- stack(test)
ggplot(stacked_test,aes(x=values))+
geom_histogram(data=subset(stacked_test,ind =='practice.1'),
aes(fill = ind), alpha = 0.5, bins = 150) +
geom_histogram(data=subset(stacked_test,ind == 'practice.2'),
aes(fill = ind), alpha = 0.5, bins = 150) +
geom_histogram(data=subset(stacked_test,ind == 'practice.3'),
aes(fill = ind), alpha = 0.5, bins = 150)
```

This week we will read and discuss chapter four of Hubbard’s ‘How To Measure Anything’.

Hubbard, Douglas W. How To Measure Anything: Finding the Value of Intangibles in Business. 2nd ed. Vol. Second Edition. Hoboken, New Jersey: John Wiley & Sons, 2014.

Here we use R’s built in `OrchardSprays`

data to run the example from the `tidyverse`

Violin plot examples (Wickham 2021).

```
ggplot(OrchardSprays, aes(y = decrease, x = treatment, fill = treatment))+
geom_violin() +
geom_boxplot(width = 0.1) +
theme(legend.position = "none")
```

A variation on the example from edav using the `ggridges`

library (Wilke 2021).

```
ggplot(OrchardSprays,
aes(x=decrease,y = treatment,fill = treatment)) +
geom_density_ridges_gradient(scale=2) +
theme_ridges() +
theme(legend.position = "none")
```

More examples on the `rdrr.io`

CRAN website.

To see more options for plotting high dimensional data visit the High Domensional Data vignette.

Welcome to eleventh seminar of **Decision Analysis and Forecasting for Agricultural Development**.Feel free to bring up any questions or concerns in the Slack or to Dr. Cory Whitney or the course tutor.

Review the process in the Models lecture.

We have already built some simple models in the Decision Models lecture and the Model programming seminar. Now we can explore another simple model with `decisionSupport`

.

This model adapted from Rojas et al. (2021) represents the decision of cherry farmers in Chile to adopt hail nets.

Here we generate an input table to feed the hail model function. Update this with a new management cost variable shown in the graphical impact pathway. Call your new variable `"p_hail"`

, make the lower bound `0.02`

and the upper bound `0.2`

, make the distribution `"posnorm"`

, make the label `"% chance hail"`

and make the description `"Probability of a hail storm"`

.

```
hail_estimates <- data.frame(variable = c("yield",
"var_CV",
"initial_investment",
"price"),
lower = c(6000, 20, 500, 5),
median = NA,
upper = c(14000, 20, 1000, 80),
distribution = c("posnorm",
"const",
"posnorm",
"posnorm"),
label = c("Yield (kg/ha)",
"Coefficient of variation",
"Investment cost (USD)",
"Market price (EUR/kg)"),
Description = c("Yield under normal conditions",
"Coefficient of variation (measure of relative variability)",
"Investment cost",
"Market price achieved for yields (EUR/kg)"))
hail_estimates
```

```
hail_estimates <- data.frame(variable = c("yield",
"var_CV",
"initial_investment",
"price",
"p_hail"),
lower = c(6000, 20, 500, 5, 0.02),
median = NA,
upper = c(14000, 20, 1000, 80, 0.2),
distribution = c("posnorm",
"const",
"posnorm",
"posnorm",
"posnorm"),
label = c("Yield (kg/ha)",
"Coefficient of variation",
"Investment cost (USD)",
"Market price (EUR/kg)",
"% chance hail"),
Description = c("Yield under normal conditions",
"Coefficient of variation (measure of relative variability)",
"Investment cost",
"Market price achieved for yields (EUR/kg)",
"Probability of a hail storm"))
hail_estimates
```

Here we create a function following the graphical impact pathway and using the inputs above to calculate the Net Present Value for the investment in hail nets. We use the `vv()`

function from the `decisionSupport`

package to add more variation over time (Luedeling et al. 2021). We also use the `chance_event()`

function to calculate a `hail_adjusted_yield`

for losses when there is hail.

```
hail_function <- function(){
# use vv() to add variability to the
# random draws of yield and of price
# over a 20 year simulation
yields <- vv(var_mean = yield,
var_CV = var_CV,
n = 20)
prices <- vv(var_mean = price,
var_CV = var_CV,
n = 20)
# use rep() to simulate the initial_investment
# only in the first year (assuming the net lasts 20 years)
invest_costs <- c(initial_investment, rep(0, 19))
# use p_hail in the chance_event()
# to adjust yield for probability of hail
# assuming no yield at all in the event of hail
hail_adjusted_yield <- chance_event(chance = p_hail,
value_if = 0,
value_if_not = yield,
n = 20)
# calculate profit without net
profit_no_net <- hail_adjusted_yield*prices
# calculate profit with the net
profit_with_net <- (yields*prices)-invest_costs
# use 'discount' to calculate net present value
# 'discount_rate' is expressed in percent
NPV_no_net <- discount(profit_no_net, discount_rate = 5, calculate_NPV = TRUE)
NPV_net <- discount(profit_with_net, discount_rate = 5, calculate_NPV = TRUE)
# calculate the overall NPV of the decision (do - don't do)
NPV_decision <- NPV_net - NPV_no_net
return(list(NPV_no_net = NPV_no_net,
NPV_net = NPV_net,
NPV_decision = NPV_decision))
}
```

We can use the `mcSimulation()`

function from the `decisionSupport`

package to implement a model (Luedeling et al. 2021).

```
# Run the Monte Carlo simulation using the model function
hail_mc_simulation <- mcSimulation(estimate = as.estimate(hail_estimates),
model_function = hail_function,
numberOfModelRuns = 200,
functionSyntax = "plainNames")
hail_mc_simulation
```

Here we show the results of a Monte Carlo simulation (200 model runs) for estimating the comparative profits with and without hail nets.

```
plot_distributions(mcSimulation_object = hail_mc_simulation,
vars = c("NPV_no_net", "NPV_net"),
method = 'smooth_simple_overlay',
base_size = 7)
```

Calculate Value of Information (VoI) analysis with the Expected Value of Perfect Information (EVPI). As we learned in Lecture 8 on forecasts, EVPI helps us determine if more research will be helpful in understanding the expected change in the value of a decision outcome through reducing uncertainty on a given variable.

Use the function `data.frame()`

to transform the x and y outputs of the `mcSimulation()`

function results for EVPI calculation. We use the `multi_EVPI()`

function to calculate the EVPI for multiple independent variables. For the first_out_var argument we choose `NPV_decision`

from the input table since this is the first output variable (the only variable) and will the subject of the EVPI.

```
# subset the outputs from the mcSimulation function (y)
# to run the multi_EVPI only on the variables that the we want
# (i.e. the NPV_decision)
mcSimulation_table_hail <- data.frame(hail_mc_simulation$x,
hail_mc_simulation$y[3])
evpi_hail <- multi_EVPI(mc = mcSimulation_table_hail,
first_out_var = "NPV_decision")
```

We use the function `plot_evpi()`

on the results from `multi_EVPI()`

to plot the Expected Value of Perfect Information (EVPI). Here we show the results with the standard settings. The length of the bars is equal to EVPI.

`plot_evpi(evpi_hail, decision_vars = "NPV_decision")`

Once you have followed and run the code above on your machine it is a good time to look through the outline of these procedures in the example by Lanzanova et al. (2019) called ‘Sediment management in a reservoir in Burkina Faso’. It runs a Monte-Carlo-based selection of sedimentation management strategies for a reservoir in the Upper Volta River Basin of Burkina Faso (Lanzanova et al. 2019).

*Tip*: If you want to play with this code you can find the Rmarkdown file and the estimate table in the GitHub repository.

Follow Chelsea Parlett Pelleriti’s Bayesian statistics work

Listen to the ‘Learning Bayesian Statistics’ podcast

Rojas, Gonzalo, Eduardo Fernandez, Cory Whitney, Eike Luedeling, and Italo F. Cuneo. “Adapting Sweet Cherry Orchards to Extreme Weather Events – Decision Analysis in Support of Farmers’ Investments in Central Chile.” Agricultural Systems 187 (February 2021): 103031. https://doi.org/10.1016/j.agsy.2020.103031.

Welcome to the 12th seminar of **Decision Analysis and Forecasting for Agricultural Development**. Feel free to bring up any questions or concerns in the Slack or to Dr. Cory Whitney or the course tutor.

In this seminar, we will discuss how to use Zotero for citations in RMarkdown. If you use an alternative reference management system, you are encouraged to find out how to use it with RMarkdown. However, in this course let’s work with Zotero.

Let’s start by downloading and installing Zotero.

**1. Install Zotero :**

- If you haven’t used Zotero, please download and install Zotero from zotero.org.

**2. Export your Zotero library to a .bib file**:

- Open Zotero.
- Go to
`File`

, select`Export Library`

and choose the`BibTeX`

format.

- Save the exported .bib file to your project directory.

**3. Add the .bib file to your RMarkdown document**:

- In your RMarkdown document, specify the path to the .bib file using the
`bibliography`

field in the YAML header. - Optionally, include a
`csl`

field if you want to use a specific citation style. Different citation styles can be downloaded from Citation Style Language. Save the exported csl file to your project directory.

**4. Cite references in your RMarkdown document**:

- Use
`[@citation_key]`

to cite references within the text.

Here’s an example of how to set this up in an RMarkdown document:

```
---
title: "Your Document Title"
author: "Your Name"
date: "2024-07-24"
output:
pdf_document: default
html_document: default
bibliography: references.bib
csl: nature.csl # Optional, specify a citation style
---
# Introduction
This is an example of citing a reference [@author2020].
# Method
Describe your method here.
# Results
Present your results here.
# Conclusion
Conclude your document here.
# References
```

Allaire, JJ, Yihui Xie, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, Hadley Wickham, Joe Cheng, Winston Chang, and Richard Iannone. 2021. *Rmarkdown: Dynamic Documents for r*. https://CRAN.R-project.org/package=rmarkdown.

Bertsch McGrayne, Sharon. 2011. *The Theory That Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted down Russian Submarines, & Emerged Triumphant from Two Centuries of Controversy*. Yale University Press.

Do, Hoa, Eike Luedeling, and Cory Whitney. 2020. “Decision Analysis of Agroforestry Options Reveals Adoption Risks for Resource-Poor Farmers.” *Agronomy for Sustainable Development* 40 (3): 20. https://doi.org/10.1007/s13593-020-00624-5.

Fernandez, Eduardo, Gonzalo Rojas, Cory Whitney, Italo F Cuneo, and Eike Luedeling. 2021. “Data and Scripts: Adapting Sweet Cherry Orchards to Extreme Weather Events – Decision Analysis in Support of Farmers’ Investments in Central Chile.”

Fernandez, Eduardo, Cory Whitney, and Eike Luedeling. 2021. “Applying the mcSimulation Function in decisionSupport.” https://cran.r-project.org/web/packages/decisionSupport/vignettes/example_decision_function.html.

Gabry, Jonah, and Tristan Mahr. 2021. *Bayesplot: Plotting for Bayesian Models*. https://mc-stan.org/bayesplot/.

Galindo-Prieto, Beatriz, Lennart Eriksson, and Johan Trygg. 2014. “Variable Influence on Projection (VIP) for Orthogonal Projections to Latent Structures (OPLS): Variable Influence on Projection for OPLS.” *Journal of Chemometrics* 28 (8): 623–32. https://doi.org/10.1002/cem.2627.

Howard, Ronald A., and Ali E. Abbas. 2015. *Foundations of Decision Analysis*. NY, NY: Prentice Hall.

Hubbard, Douglas W. 2014. *How To Measure Anything: Finding the Value of Intangibles in Business*. 2nd ed. Vol. Second Edition. Hoboken, New Jersey: John Wiley & Sons.

Kahneman, Daniel, and Patrick Egan. 2011. *Thinking, Fast and Slow*. Vol. 1. New York: Farrar, Straus; Giroux.

Lanzanova, Denis, Cory Whitney, Keith Shepherd, and Eike Luedeling. 2019. “Improving Development Efficiency Through Decision Analysis: Reservoir Protection in Burkina Faso.” *Environmental Modelling & Software* 115 (May): 164–75. https://doi.org/10.1016/j.envsoft.2019.01.016.

Luedeling, Eike, Lutz Goehring, Katja Schiffers, Cory Whitney, and Eduardo Fernandez. 2021. *decisionSupport: Quantitative Support of Decision Making Under Uncertainty*. http://www.worldagroforestry.org/.

Luedeling, Eike, and JDe Leeuw. 2014. “Water for Wajir.” *Worldagroforestry.org*.

Luedeling, Eike, Arjen L. Oord, Boniface Kiteme, Sarah Ogalleh, Maimbo Malesu, Keith D. Shepherd, and Jan De Leeuw. 2015. “Fresh Groundwater for Wajir - Ex-Ante Assessment of Uncertain Benefits for Multiple Stakeholders in a Water Supply Project in Northern Kenya.” *Frontiers in Environmental Science* 3, article 16: 1–18. https://doi.org/10.3389/fenvs.2015.00016.

Luedeling, Eike, and Keith Shepherd. 2016. “Decision-Focused Agricultural Research.” *Solutions* 7 (5): 46–54. https://doi.org/https://www.thesolutionsjournal.com/article/decision-focused-agricultural-research/.

R Core Team. 2021. *R: A Language and Environment for Statistical Computing*. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Robinson, Ken. 2017. *Out of Our Minds: Learning to Be Creative*. Oxford, United Kingdom: John Wiley & Sons.

Rojas, Gonzalo, Eduardo Fernandez, Cory Whitney, Eike Luedeling, and Italo F. Cuneo. 2021. “Adapting Sweet Cherry Orchards to Extreme Weather Events – Decision Analysis in Support of Farmers’ Investments in Central Chile.” *Agricultural Systems* 187 (February): 103031. https://doi.org/10.1016/j.agsy.2020.103031.

Ruett, Marius, Cory Whitney, and Eike Luedeling. 2020. “Model-Based Evaluation of Management Options in Ornamental Plant Nurseries.” *Journal of Cleaner Production* 271 (June): 122653. https://doi.org/10.1016/j.jclepro.2020.122653.

Savage, Sam L., and Harry M. Markowitz. 2009. *The Flaw of Averages: Why We Underestimate Risk in the Face of Uncertainty*. John Wiley & Sons.

Shepherd, Keith, Douglas Hubbard, Norman Fenton, Karl Claxton, Eike Luedeling, and Jan de Leeuw. 2015. “Development Goals Should Enable Decision-Making.” *Nature* 523 (7559): 152–54.

Tetlock, Philip E., and Dan Gardner. 2015. *Superforecasting: The Art and Science of Prediction*. New York, NY: Crown Publishers.

Whitehead, Alfred North, and Bertrand Russell. 2011. *Principia Mathematica*. San Bernardio, CA: Rough Draft Printing.

Whitney, Cory, Denis Lanzanova, Caroline Muchiri, Keith D. Shepherd, Todd S. Rosenstock, Michael Krawinkel, John R. S. Tabuti, and Eike Luedeling. 2018. “Probabilistic Decision Tools for Determining Impacts of Agricultural Development Policy on Household Nutrition.” *Earth’s Future* 6 (3): 359–72. https://doi.org/10.1002/2017EF000765.

Wickham, Hadley. 2021. *Tidyverse: Easily Install and Load the Tidyverse*. https://CRAN.R-project.org/package=tidyverse.

Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, Hiroaki Yutani, and Dewey Dunnington. 2021. *Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics*. https://CRAN.R-project.org/package=ggplot2.

Wilke, Claus O. 2021. *Ggridges: Ridgeline Plots in Ggplot2*. https://wilkelab.org/ggridges/.

Wold, Svante, Michael Sjöström, and Lennart Eriksson. 2001. “PLS-Regression: A Basic Tool of Chemometrics.” *Chemometrics and Intelligent Laboratory Systems*, PLS Methods, 58 (2): 109–30. https://doi.org/10.1016/S0169-7439(01)00155-1.