Performance enhancements in NASA's recently developed Lattice Boltzmann solver within the Launch Ascent and Vehicle Aerodynamics (LAVA) framework are presented. Two key algorithmic developments are highlighted. A coarse-fine interface treatment that discretely conserves mass and momentum has been implemented and successfully verified and validated. Code optimizations targeting improved serial and parallel performance were presented. For a simple turbulent Taylor-Green Vortex problem, we were able to demonstrate a 2.3 times speedup over the baseline code for a single Skylake-SP node containing 40 physical cores, and a 2.14 times speedup for 64 nodes containing 2560 physical cores. In addition, we were able to show that the optimizations enabled us to scale the code almost perfectly to 20480 physical cores where, including ghost cells, the problem size was 10 billion cells.