This dissertation investigates the ways that natural languages evolve and what it means in the overall cultural evolution of society. Computational and modeling advances have made possible to explore large-scale text data and test hypothesis of language evolution. Similar to biological systems, natural languages are evolving systems with words as its measurable units. Words have certain functions within a body of text to convey ideas and thought. The frequency distribution of these words can change based on how it is used at a particular point in time and context. This work incorporates two different sources of text data: 109 year’s worth of digitized books and text taken from social media. Given large-scale diachronic corpora, this work focuses on the following topics: 1. Modeling word rank evolution utilizing statistical and data-driven modeling approaches. 2. Exploring the evolution of contextual semantics through the use of distributional semantics and word embedding models. 3. Evaluating the accuracy of reading comprehension tasks by using contemporary machine learning models.
The evolution of language presents a profound problem in natural language processing and cognitive science. The social and cultural aspect of language adds to the problem of how word meanings develop and change. Examining how language evolves, while drawing from both molecular biology and socio-cultural aspects, allows us to explore the ways word meanings form that are influenced by political and social changes.