This thesis investigates a model of informational nudging. It focuses on a situation where a decision maker faces a set of alternatives. Each alternative brings a certain stochastic reward/payoff to the decision maker. The decision maker/user repeatedly chooses from the set of alternatives so as to maximize the reward she obtains. The user remembers her past experiments and builds an estimate of the reward of each alternative to make her future decision. The reward estimate is built with the assumption that the user averages the reward of the alternative she just chose with her past reward estimate, using a non summable, square summable sequence of averaging factors, while leaving the estimate of the alternative she did not choose unchanged.The decision process is repeated over an infinite time horizon andthe relative importance she gives to new experiment compared to her past experiment decreases as time goes on. This is a key assumption to study the asymptotic behavior of the process, since we use stochastic averaging techniques. At each step of the process the user chooses the alternative using her payoff estimate and a logit rule.With this model the user can only gather information about one alternative at each step of the process, hence the estimate of a rarely chosen alternative is not often updated.Therefore we introduce a recommender who provides information about the unchosen alternatives at every step, making it possible for the user to update the payoff estimate of all alternatives at every step of the process. This modifies the payoff estimate, modifies the subsequent choice of the users, i.e. the whole decision process. We areparticularly interested in studying the situation where the recommender provides incorrect or misleading information to influence the decision maker behavior, as a way to achieve more desirable equilibria. Building on the theory of stochastic averaging, control strategiesare derived to enforce a desired equilibrium.
【 预 览 】
附件列表
Files
Size
Format
View
On informational nudging and control of payoff-based learning