The solution of elliptic partial differential equations is a commonperformance bottleneck in scientific simulations.By exploitingstructure in a problem, robust structured multigrid methods gainimportant performance benefits because they preserve structurethroughout the multigrid hierarchy.In parallel these methods benefitfrom nearest neighbor stencil-based communication patterns; however,the increased communication demands of coarse-grid problems and blocksmoothers needed for a robust solver challenge parallel efficiency.In this dissertation, methods for reducing parallel communicationthrough changes in the parallel implementation are explored.Toreduce communication costs for coarse-grid problems, recursiveagglomeration of tasks on a logically structured grid of processors isconsidered.This communication is optimized using a predictiveperformance model to guide how tasks are agglomerated.This approach providesan efficient strategy for parallel coarsening in a structured settingthat can adapt to changes in the target architecture or multigridalgorithm through its incorporation in the performance model.Parallel results show favorable weak scaling using this strategy outto \(500\)k cores and consistency of the performance model inquantifying the cost of various redistribution decisions.To reduce communication costs in block smoothers, an automatedstrategy for aggregating communication across blocks is considered.With minor changes to a block solver due to the introduction of aservice abstraction layer, user-level threads are used to executeblocks concurrently so communication can be aggregated.This resultsin a reduction in the amount of messages sent during a block smoothingoperation.This strategy is demonstrated in plane smoothing to extendthe strong scaling limit by reducing communication latency costs.Parallel results demonstrate scalable multilevel relaxation with\(\log p\) communication complexity and plane relaxation withautomated communication aggregation that doubles the strong scalingperformance of a V-cycle.Lastly, the application of robust structured solvers to emergingheterogeneous architectures is considered.Benchmarks are used todevelop a performance expectation for structured matrix-basedoperations on each target processing unit.OpenMP with unified memoryis then used to offload solve phase operations in the open-source, structuredvariational multigrid solver Cedar.The performanceexpectation is then used to provide context for performance gains bytargeting GPUs on Sierra---a current Power9 system at LawrenceLivermore National Laboratory.Results show speedup of a CedarV-cycle targeting a V100 GPU over a Power9 CPU consistent with anapproximate speedup estimated by comparing achievable memory bandwidthon each processing unit.