subject
Computers and Technology, 27.11.2019 02:31 xojade

Consider an agent starting in a room a in which it can take two possible actions: to leave the room (action "l") or to stay (action "s"). if it leaves a, the agent moves to room b, which is a terminal state (no more actions can be taken). the outcomes of the actions are uncertain, so that when executing action l (or action s), there is some probability that the agent will leave a (or stay in a). we assume that the reward in entering state b is r(b) = +1 and the reward for being in state a is r(a) = -0.1. (a) draw the (very simple) diagram corresponding to this mdp. answer by inspection of the diagram: what is the optimal policy? (b) assume that the agent knows neither the world (transition probabilities) nor the utilities of the states. assume that the agent, for some reason, happens to follow the optimal policy. the rewards received at states a and b are the same as described above.. in the process of executing this policy, the agent execute four trials and, in each trial, it stops after reaching state b. the following state sequences are recorded during the trials: aaab, aab, ab, ab. what is the estimate of t., what is the estimate of u(a), assuming a discount factor of = 0.5?

ansver
Answers: 2

Another question on Computers and Technology

question
Computers and Technology, 23.06.2019 09:30
After you present a proposal, the committee starts asking you questions, some beyond the strict focus of your proposal. they ask questions about implications in other fields and knowledge about other fields. you are asked to redo your proposal. what is most likely missing? breadth of material depth of material clarity of material details of material
Answers: 1
question
Computers and Technology, 24.06.2019 00:30
Which boolean operator enables you to exclude a search term? a} not b} and c} or d} plus
Answers: 1
question
Computers and Technology, 24.06.2019 07:00
Why do we mark tlc plates with pencil and not with pen
Answers: 2
question
Computers and Technology, 24.06.2019 13:30
Consider jasperโ€™s balance sheet. which shows how to calculate jasperโ€™s net worth?
Answers: 1
You know the right answer?
Consider an agent starting in a room a in which it can take two possible actions: to leave the room...
Questions
question
Mathematics, 19.11.2019 05:31
question
Mathematics, 19.11.2019 05:31
question
Mathematics, 19.11.2019 05:31
Questions on the website: 13722360