In a bandit problem there is a set of arms, each of which when played by an agent yields some reward depending on its internal state which evolves stochastically over time. In this thesis we consider bandit problems in an online framework which involves sequential decision-making under uncertainty. Within the context of this class of problems, agents who are initially unaware of the stochastic evolution of the environment (arms), aim to maximize a common objective based on the history of actions and observations. The classical difficulty in a bandit problem is the exploration-exploitation dilemma, which necessitates a careful algorithm design to balance information gathering and best use of available information to achieve optimal performance. The motivation to study bandit problems comes from its diverse applications including cognitive radio networks, opportunistic spectrum access, network routing, web advertising, clinical trials, contract design and many others. Since the characteristics of agents for each one of these applications are different, our goal is to provide an agent-centric approach in designing online learning algorithms for bandit problems.When there is a single agent, different from the classical work on bandit problems which assumes IID arms, we develop learning algorithms for Markovian arms by considering the computational complexity. Depending on the computational power of the agent, we show that different performance levels ranging from optimality in weak regret, to strong optimality can be achieved.Apart from classical single-agent bandits, we also consider the novel area of multi-agent bandits which has informational decentralization and communication aspects not present in single-agent bandits. For this setting, we develop distributed online learning algorithms that are optimal in terms of weak regret depending on communication and computation constraints.