学位论文

【摘要】

The stochastic sequential assignment problem (SSAP) studies the allocation of available distinct workers with deterministic values to sequentially-arriving tasks with stochastic parameters so as to maximize the expected total reward obtained from the assignments. The difficulty and challenge in making the assignment decisions is that the assignments are performed in real-time; specifically, pairing a worker with a task is done without knowledge of future task values. This thesis focuses on studying practical variations and extensions of the SSAP, with the goal of eliminating restricting assumptions so that the problem setting converges to that of real-world problems. The existing SSAP literature considers a risk-neutral objective function, seeking an assignment policy to maximize the expected total reward; however, a risk-neutral objective function is not always desirable for the decision-maker since the probability distribution function (pdf) of the total reward might carry a high probability of low values. To take this issue into account, the first part of this dissertation studies the SSAP under a risk-sensitive objective function. Specifically, the assignments are performed so as to minimize the threshold probability, which is the probability of the total reward failing to achieve a specified target (threshold). A target-dependent Markov decision process (MDP) is solved, and sufficient conditions for the existence of a deterministic Markov optimal policy are provided. An approximate algorithm is presented, and convergence of the approximate value function to the optimal value function is established under mild conditions. The second part of this thesis analyzes the limiting behavior of the SSAP as the number of assignments approaches infinity. The optimal assignment policy for the basic SSAP has a threshold structure and involves computing a new set of breakpoints upon the arrival of each task, which is cumbersome for large-scale problems. To address this issue, the second part of this dissertation focuses on obtaining stationary (time-independent) optimal assignment policies that maximize the long-run expected reward per task and are much easier to perform in real-world problems. An exponential convergence rate is established for the convergence of the expected total reward per task to the optimal value as the number of tasks approaches infinity. The limiting behavior of the SSAP is studied in two different settings. The first setting assumes an independent and identically distributed (IID) sequence of arriving tasks with observable distribution functions, while the second problem considers the case where task distributions are unobservable and belong to a pool of feasible distributions. The next part of this dissertation basically brings the first two parts together, studying the limiting behavior of the target-dependent SSAP, where the goal is finding an assignment policy that minimizes the probability of the long-run reward per task failing to achieve a given target value. It is proven that the above-mentioned stationary policy (mentioned in the previous paragraph), which achieves the long-run expected reward per task, minimizes the long-run threshold probability in this problem as well. These two objective functions being optimized simultaneously by one assignment policy is interesting, since the threshold criteria, by definition, deviates from the expected total reward criteria; i.e., although it attempts to avoid hitting below a given target level as much as possible, it does not automatically and necessarily guarantee a reasonable performance in terms of the expected reward.Finally, stochasticity in the SSAP is extended to worker values in the last part of this thesis, where the worker values are assumed to be random variables, taking on new values upon each task arrival. Four models are introduced which analyze this problem from different aspects; e.g., the distribution function of worker values being identical or distinct, the worker values in a given time period being dependent on those in the preceding time periods (or within the same time period), worker values being deterministic but assumed to be functions of time (possibly deteriorating with time), and task values being independent or dependent on each other. For each of these models an optimal assignment policy is presented so as to maximize the expected total reward.

【预览】

附件列表
Files	Size	Format	View
Stochastic sequential assignment problem	754KB	PDF	download


Stochastic sequential assignment problem
sequential assignment;Markov decision process;stationary policy;hidden Markov model;threshold criteria;risk measure
Baharian Khoshkhou, Golshid
关键词: sequential assignment; Markov decision process; stationary policy; hidden Markov model; threshold criteria; risk measure;
Others : https://www.ideals.illinois.edu/bitstream/handle/2142/50503/Golshid_Baharian%20Khoshkhou.pdf?sequence=1&isAllowed=y
美国\|英语
来源: The Illinois Digital Environment for Access to Learning and Scholarship
PDF


	文献评价指标
	下载次数：27次	浏览次数：23次

【 摘 要 】

【 预 览 】

【摘要】

【预览】