Love Data Week (LDW) is an international event to raise awareness about data science. What do I mean by data science? LDW presenter Dr. Xiao-li Meng described data science as a broad and encompassing area, one so big that it couldn’t be trapped in a single department of a given school in the academy but should be a school in its own right. Dr. Meng encouraged us all to see ourselves as data scientists, because data is in everything (see more on Meng’s talk below).
LDW workshops occurred at the Danforth and Washington University School of Medicine campuses. Most of these workshops were not only at capacity, but had as many people (or more) on the waiting list as they had confirmed registrants. This demonstrates the high demand for these types of workshops, so be on the lookout for more. Workshops included were:
RedCap is a web-based tool which allows researchers to create databases and surveys suitable for a variety of studies. The Intro to RedCap workshop presenter, Christopher Sorensen, walked participants through the basics of creating a data collection instrument. Sorensen showed how to leverage this tool to collect data in a way that helps users manage and secure data.
Tableau is a quick and powerful tool for visualizing data and delivering it in a sophisticated way. Manan Shroff, a data scientist from the Office of the Vice Chancellor for Research, led participants through an exercise to recreate a sleek dashboard by Ryan Sleeper. Tableau allows users to visualize multiple sheets and variables and display them in a dashboard. Users can also create data stories by combining multiple dashboards.
In the R vs.Python workshop, participants learned how to write scripts to do similar tasks in each of these programming languages. R and Python are both tools for cleaning, organizing, and analyzing data, but each tool has specific strengths. This workshop was jointly presented by Mollie Webb and Dorris Scott from Data Services on the Danforth Campus and Marcy Vana and Maze Ndukum from Data and Computing Services on the Medical School campus. This fast-paced, energy-filled workshop left attendees craving a follow-up, intermediate offering.
AI, the Beatles, and Election: a Nano Tour of Data Science
Our LDW keynote was delivered by Xiao-li Meng, professor of statistics and editor of the Harvard Data Science Review. In his engaging and energetic presentation, Meng challenged us all to consider the reliability of the data we collect, share, and consume. He talked about the many perspectives data can be viewed from: researchers, administrations, and educated citizens. Meng described how artificial intelligence (AI) is often misconstrued as computer anything. Yet AI originally meant computers that understand like humans. Meng rejects this idea because processing and understanding are not one in the same. He described a paper by Michael I. Jordan which suggests we should instead pursue IA (intelligence augmentation), where computers complement the thinking of humans.
Data science has a big role to play in IA but is often misconstrued as just machine learning, or just statistics, or just prediction, or just analysis, or just in STEM fields. The impact that data science can have on our lives, and our careers, is very appealing, and this led Meng to address our reproducibility crisis. Why is there a crisis? Because research that doesn’t show impact is not interesting to publish or promote. Our desire to produce impact may lead us to work our research over until we find the results we want–the impactful ones. Meng said, “If you torture the data enough, it will confess!” If researchers thoroughly document and share data or code, the potential for this kind of cherry picking would be discouraged, but often researchers don’t do this, or don’t do it well enough, and it feeds the crisis because it allows researchers to take shortcuts and create impact where it might not exist.
Meng described some of the pitfalls that researchers are unlikely to want to document (e.g., absolute sample size rather than a relative sample size in research and/or having a lot of noise in the data). Meng also described the problem at scale due to the buzz around Big Data. He reminded us that big data also has big noise. Data quality is much more important than data size.
Meng cited Christine Borgman, Distinguished Research Professor of Information Studies at UCLA, as introducing him to the concept of data curation. Data curation is the process by which data, code, and documentation are reviewed, augmented, and transformed to ensure they are at least reusable and at best replicable. We often talk about making data FAIR through the curation process (findable, accessible, interoperable, reusable). Data curators are also data scientists who work hand-in-hand with researchers.
Dr. Xiao-li Meng’s talk inspired meaningful questions and thoughtful reflection from many of us. Following the talk at Danforth, Data Services participated in an activities fair organized by the University Libraries’ Student Engagement Committee. We set up a VR headset and invited students to blow off some steam by playing our virtual bow and arrow game.
3rd Annual ICTS Symposium and Poster Session
The final day of LDW was celebrated at the 3rd Annual ICTS Symposium and Poster Session, “Building a Learning Healthcare System: From Lab to Laptop,” featuring Patricia F. Brennan, RN, PhD, director of the National Library of Medicine (NLM). Brennan talked about how various aspects of data science are being used at NLM to enhance the library’s processes and services. For example, NLM is currently leveraging machine learning to improve the relevance ranking of results returned when searching NLM’s PubMed biomedical literature database, used by millions around the world on a daily basis.
Like Meng, Brennan emphasized the importance of seeing data science as something beyond just data analysis and highlighted the essential role of data curation and data management. In addition, Brennan conveyed her view of the pivotal role libraries play in data science, challenging current and future library staff to continue to hone these skills as integral collaborators in this space.