The object of this dissertation research is to explore and analyze advanced high-performance load-balanced switch architectures that can scale well in both switch size (in terms of the number of switch ports) and link speed, provide throughput guarantees, achieve low latency, and maintain packet ordering. Load-balanced switch (LBS) architectures are known to be scalable in both size and speed, which is of interest due to the continued exponential growth in Internet traffic. However, the main drawback of load-balanced switches is that packets can depart out of order from the switch, while all the proposed modifications mitigating this packet reordering problem tend to increase packet delay significantly in comparison to the basic load-balanced switch. In this dissertation research, we investigated several different methodologies to address this issue. The first approach we considered is to rectify the packet reordering problem by simply buffering and re-sequencing the out-of-order packets at the switch outputs. We formally bound the worst-case amount of time that a packet has to wait in these output reordering buffers before it is guaranteed to be ready for in-order departure with high probability, and we prove that this bound is linear with respect to the switch size. The second approach we considered is a randomized load-balancing scheme which forces all packets belonging to the same application flow to be routed through the same path through the switch, together with two safety mechanisms that can uniformly diffuse packets across the switch whenever there is a build-up of packets waiting to route through the some intermediate port. Although simple and intuitive, our experimental results show that our schemes substantially outperforms existing load-balanced switch architectures.