Group of college students using laptops in lecture
Back to All Events
John M. Olin Library, Instruction Room 3

Introduction to Text Analysis in Python

This four-session course will provide participants with an introduction to analyzing textual data using Python. We will begin by learning how to perform simple operations on text and convert text into data. This will cover topics such as working with strings, text preprocessing, NLP tasks (e.g., stemming, tokenizing), as well as representing text as data (e.g., bag-of-words, word embeddings). Subsequently, the course will introduce methods for measuring concepts using textual data and provide an overview of rule-based techniques, supervised learning, and unsupervised learning approaches. Specifically, we will delve into utilizing dictionaries, the application of Naive Bayes, Random Forests and SVMs for text classification.

This course is intended for graduate students, faculty and staff from any field at WashU who are interested in learning about quantitative text analysis and would like to become familiar with the main libraries and functions used to work with textual data in Python. Participants are expected to have a basic proficiency in Python (taking the Introduction to Python training series 1 and 2 should be sufficient).

This class will be fully in-person, and participants will use their own laptops.

Dates of Introduction to Text Analysis in Python series (all held in Olin Library, Instruction Room 3 from 2–3:30 pm):
  • Tuesday, March 18
  • Wednesday, March 19
  • Tuesday, March 25
  • Wednesday, March 26

DataLab Workshops 

DataLab is a collaboration between Data Services and TRIADS, Bernard Becker Medical Library, TechDen, and DI2 to provide a breadth of workshops from the basics of understanding data to working with data tools. These workshops are open to all WashU affiliates and are held in the fall and spring semesters.