One potential disadvantage of social tagging systems is that due to the lack of a centralized vocabulary, a crowd of users may never manage to reach a consensus on the description of resources (e.g., books, images, users or songs) on the Web. Yet, previous research has provided interesting evidence that the tag distributions of resources in social tagging systems may become semantically stable over time as more and more users tag them and implicitly agree on the relative importance of tags for a resource. At the same time, previous work has raised an array of new questions such as: (i) How can we assess semantic stability in a robust and methodical way? (ii) Does the semantic stabilization varies across different social tagging systems and ultimately, (iii) what are the factors that can explain semantic stabilization in such systems? In this work we tackle these questions by (i) presenting a novel and robust method which overcomes a number of limitations in existing methods, (ii) empirically investigating semantic stabilization in different social tagging systems with distinct domains and properties and (iii) detecting potential causes of stabilization and implicit consensus, specifically imitation behavior, shared background knowledge and intrinsic properties of natural language. Our results show that tagging streams which are generated by a combination of imitation dynamics and shared background knowledge exhibit faster and higher semantic stability than tagging streams which are generated via imitation dynamics or natural language phenomena alone.