subject

Consider an MDP with 3 states, A, B and C; and 2 actions Clockwise and Counterclockwise. We do not know the transition function or the reward function for the MDP, but instead, we are given with samples of what an agent actually experiences when it interacts with the environment (although, we do know that we do not remain in the same state after taking an action). In this problem, instead of first estimating the transition and reward functions, we will directly estimate the Q function using Q-learning. Assume, the discount factor, γ is 0.5 and the step size for Q-learning, α is 0.5. Our current Q function, Q(s, a), is shown in the left figure. The agent encounters the samples shown in the right figure: s A B a s' с r Clockwise 1.501 -0.451 2.73 A Counterclockwise C 8.0 Counterclockwise 3.153-6.055 2.133 Counterclockwise A 0.0
Provide the Q-values for all pairs of (state, action) after both samples have been accounted for.

ansver
Answers: 3

Another question on Computers and Technology

question
Computers and Technology, 22.06.2019 17:00
What allows you to create a wireless connection among your smart devices
Answers: 2
question
Computers and Technology, 24.06.2019 17:40
Create a file called favorite_foods, and list your favorite foods, entering five or six or more. press enter after each favorite food so it appears on its own line (make certain you also press enter after the final food item). after the file is created, add two more foods you like that are not on the list (press enter after the final food item). view the list of foods to make certain the two items you added appear at the end of the list
Answers: 2
question
Computers and Technology, 25.06.2019 06:50
The federal sentencing guidelines for organizations set the tone for organizational ethics compliance programs by question 1 options: a) codifying into law incentives for organizations to take action such as developing ethical compliance programs to prevent misconduct. b) forcing all organizations to develop mandatory reporting systems and ethics programs. c) eliminating most of the federal legislation that created inefficient and time-consuming activities for businesses. d) providing detailed guidelines for how to set up organizational ethics programs to guard against unethical conduct. e) providing a thorough examination of company codes of ethics to determine whether they are sufficient.
Answers: 1
question
Computers and Technology, 25.06.2019 07:00
Amisfire code is a type diagnostic trouble code (dtc).
Answers: 1
You know the right answer?
Consider an MDP with 3 states, A, B and C; and 2 actions Clockwise and Counterclockwise. We do not k...
Questions
question
Mathematics, 29.01.2020 22:44
Questions on the website: 13722367