Data Management for Researchers: Organize, Maintain and Share Your Data for Research Success
Data Management for Researchers: Organize, Maintain and Share Your Data for Research Success by Kristin Briney is a highly practical resource that covers key data management issues, such as planning, documentation, organization, methods for improving analyses, treatment of sensitive data, storage, sharing, and digital preservation. It opens by justifying the concern for data management and underscoring its importance, and moves into the data life cycle and roadmap, planning, acquisition, analysis, sharing, preservation, and reuse.
There are a number of versions of the data life cycle with varying complexity out there, but this one describes essential stages at a high level. Briney does a good job of connecting these stages to chapters in the book.
Each chapter covers key considerations at every stage. The planning chapter makes a case for data management and discusses how to create a plan with pertinent policies in mind. The key to transparent, reusable, and reproducible research is documentation. The documentation chapter begins by covering the use of lab notebooks, including electronic lab notebooks (ELN), their function and use, and how to get what you need from them. It also covers the documentation of methods, including data definitions and protocols. It also addresses ”readme” files, data dictionaries, code books, metadata, and standards.
The organization chapter focuses on both digital and physical content, including hierarchies, file naming conventions, versioning, a description of relational databases, and the basics of creating a query. The analysis chapter addresses the difference between working with raw, analyzed, or processed data (derivatives). Key topics of wrangling including quality control, error checking, consistency, best practices for tables, coding, code sharing, and standardization are addressed.
Briney covers the particularly difficult topic of sensitive data, including anonymization of direct and indirect personal identifiers. While the chapter on storage and backups is not the most earth-shattering, it does a solid job of laying out concerns and recommend practices. What is exciting is the chapter on long-term storage and preservation, emphasizing just by its inclusion that these are intentional rather than implicit actions. It covers length of retention, for what purpose, and policies governing these actions. Briney then talks about what to do to prepare data for long-term storage and preservation, a process that Data Services at WashU, and many other institutions, refer to as data curation.
Briney wraps up the book with data sharing concerns and reuse, finishing out again with discussion of the data lifecycle. I highly recommend this book, not only for the practical advice, but also the contextual framework in which it’s presented.