We describe a series of experiments in which artificial evolution is used to develop neural-network sensory-motor controllers for software animats which exhibit collective movement behaviours in three dimensions. We successfully evolved controllers which displayed simple group behaviours such as dispersal and aggregation. However, attempts to evolve realistic-looking schooling behaviours never succeeded. The problem appears to be due to the difficulty of formulating an evaluation function which captures what schooling is. We argue that formulating an effective fitness evaluation function for use in evolving controllers can be at least as difficult as hand-crafting an effective controller design. Although our paper concentrates on schooling, we believe that this is likely to be a general issue, and is a serious problem which can be expected to be experienced over a variety of problem domains.