Washington University Libraries and the School of Law Interdisciplinary Environmental Clinic hosted a data rescue event for the university community on April 14.
Recently, a number of data rescue events have been held nationwide, through the Environmental Data & Governance Initiative (EDGI) and DataRefuge, to help preserve information from government agencies for the future. Federal agencies do not destroy data, but most do not have a mandate to curate and preserve data for long-term access, especially after an agency or commission no longer exists. You can learn more information about the issues on the DataRescueWU libguide.
Approximately 40 people attended DataRescueWU for all or parts of the day. Attendees comprised students, faculty, and staff across the University, including Earth & Planetary Sciences, Social Work, Biology, Computer Science, Systems Engineering, Anthropology, the Tyson Research Center, and more. Additionally, a number of participants came from the local community, including the University of Missouri, Columbia.
The DataRefuge workflow allows participants to choose roles appropriate for their expertise: seeding URL’s for web crawling or harvesting, researching content and websites, and harvesting data that was found to be uncrawlable.
Despite the single day time frame for the DataRescueWU event, participants were able to nominate over 50 sites for crawling by the InternetArchive or, if uncrawlable, for harvesting and archiving through the data rescue workflow. The attendees also researched content from 30 locations (often diving 2-3 links deep) and harvested data from 4 links (over 300 million records, approximately 20GB). Content harvested included water quality data, fish and wildlife data, climate data, and more.
The harvested data and documents will be stored and backed up through Amazon Web Services and the DataRefuge repository. These harvested datasets will serve as a backup to the federal sites, ensuring the ongoing accessibility of the data through government sequestrations, government priority changes, and administration turnover.
While official feedback about the event is still being collected, informal feedback indicated a positive experience; many participants didn’t even stop for lunch. One participant who was harvesting said, “This is hard, but it’s so fun.” A wrap-up discussion at the event indicated a strong interest among participants in hosting another DataRescueWU event, possibly in collaboration with other regional libraries. If you are interested in participating or helping to plan a DataRescueWU event, please email us at: datarescueWU@gmail.com.