February 21, 2024

Digital Linguistics – How a Love of Language Inspired a Career in Data Science

Emily Williams
Emily Williams Lead Data Scientist, AT&T


Becoming a data scientist was never part of the plan.

For as long as I can remember, language has been a great love of my life. Reading it, writing it, understanding it. When I was young, I excelled in that collection of skills we called “Language Arts,” and as a young adult I developed a passion and skill for learning new languages. As an undergraduate, I naturally gravitated toward English as my major. What better place to explore the power and beauty of words?

My career began with teaching a language — English — to middle school students in South Korea. I was only 5 years away from starting a career as a data scientist, but it was a job title that wasn’t yet in my vocabulary.

The opportunity to share my language while also learning a new one felt like a dream, so I set out to enhance my skills and obtain a Master’s degree in Applied Linguistics. Here, language surprised me again. I learned about language and identity, language and power, language and change. Language, it turned out, was more than a communication tool. It was threaded into the fabric of our very society.

I decided to pursue a PhD, of course. It felt like the only path for someone who loved that wonderful combination of sociology and language study that we call linguistics. Then, after a few short months in my doctoral program, the absolutely unforeseen had happened. I found myself among a cohort of researchers who were desperately attempting to learn to code.

Programming is finding its way into many academic disciplines, but it is a particularly fine companion for the study of linguistics. Language, after all, is being mass produced online. The size and velocity with which linguistic textual data is being created demands new, faster ways of studying it. This field of study has many names: Computer-Mediated Communication, Corpus Linguistics, Internet Linguistics, Digital Linguistics—but all variations of the discipline seem to agree: it is necessary to use programming to effectively study the rapidly evolving world of language.

I learned to program. Within 6 months, I found myself with a data science internship at AT&T. It seemed like an odd match, at first. My research focus was on the field of Pragmatics – the study of language in context. Pragmatics is concerned with what we say without saying it. All the layers of communication that we maintain at once. Assumptions, background information, references to earlier points in the conversation, or earlier points in our lives. My research has spanned numerous subfields of pragmatics — presupposition, implicature, speech acts — but it has focused on how we communicate context online. What are the main ideas, what does the language look like, what can word choice and sentence structure tell us about how and what people are communicating?

At AT&T, when I had the opportunity to derive key topics and insights from large bodies of unstructured text data, it felt like a direct extension of the kinds of research I was doing at school. I realized, very quickly and with great delight, that I could actually practice linguistics in a setting that would have an impact on a business. That I could combine my newfound love of programming with my lifelong love of language.

My focus within AT&T has changed several times: chatbots, natural language processing, statistics, machine learning, data insights. Now, as Generative AI takes center stage, I feel like in some ways I have come full circle. These days I’m focused on Summarization, helping Ask AT&T learn how to summarize documents, condensing complex thoughts and concepts into brief phrases. It all has come back to language, in the end. And I feel profoundly lucky to work in an environment where, as the world and its language changes, we are at the forefront, exploring and tackling new challenges.