StatTag and StatWrap for Conducting Collaborative Reproducible Research

Challenges of reproducible research

Practicing reproducible research is important, but increasingly complex as studies involve more data and statistical code, and larger teams. Adopting reproducible research workflows can be especially daunting for research teams with a diverse set of needs, skills, and expectations for software tools.

For example, in medical research, most manuscripts are prepared in Microsoft Word, leaving clinicians to copy and paste, or even re-type, statistical estimates into Word documents. In contrast, statisticians may use R Markdown or Jupyter Notebook to generate reports weaving together statistical results with interpretation, but their collaborators may be unwilling to draft manuscripts in these programs. In addition, teams may struggle to communicate and keep track of information such as: Who worked on the analyses, when, and what decisions did they make? Where is the most recent data? What are the code file dependencies and code libraries?

Get to know StatTag and StatWrap

This talk will describe two software tools designed to address these problems — StatTag and StatWrap — both of which grew out of the challenges of conducting collaborative research in an academic health center. StatTag addresses a need to integrate document preparation in Microsoft Word with statistical code and results from R, Stata, SAS, or Python. StatWrap is an assistive, non-invasive discovery and inventory tool to document the evolution of project, combining automatically collected metadata (e.g., statistical packages, code file dependencies), investigator-supplied documentation (e.g., analysis notes, personnel), and source control. Both StatTag and StatWrap are free, open-source software programs designed to promote the conduct of reproducible research, especially for collaborative teams.

This event takes place on Thursday, September 28 at 10.15am – 11.15am as part of the colloquium of the Department of Statistics of the LMU Munich. We invite everyone interested to register for joining the colloquium online.

About the instructor

Dr. Welty’s research interests include the application of biostatistics to psychiatry and environmental research and the development of software tools for reproducible research. She leads the development team for StatTag ( – free, open-source software connecting Microsoft Word to R, SAS and Stata. She is also the lead biostatistician for the Northwestern Juvenile Project, a large-scale longitudinal study of psychiatric disorders and risky behaviors in delinquent youth, as well as NJP: NextGen, a study of the children of the original Northwestern Juvenile Project participants. She directs the Biostatistics Collaboration Center, Feinberg’s core biostatistics resource for non-cancer research. (Northwestern University 2023)

DSSGx Lecture: Analyzing Open-ended (Audio) Survey Responses: Insights from a research project

Join online sessions from the Data Science for Social Good (DSSG) lecture series for free!

Data Science for Social Good (DSSG) is a two month fellowship program of aspiring talents in the area of Data Science to work on projects with a positive societal impact. In the DSSGx Munich 2023 Lecture Series, top researchers and experts share their insight on the application of data science for social good. Parts of this series can be attended online for free.

Paul Bauer is Research Fellow & Project Director at Mannheim Centre for European Social Research. 

All open online sessions of the DSSGx Munich 2023 lecture series:

August 17, 2023: DSSG Berlin (Hasan Shaukat)

August 24, 2023: Democratizing our Data (Julia Lane)

August 31, 2023: Responsible AI to Benefit Society (Kit Rodolfa)

September 14, 2023: Open Science @ MCML (Moritz Herrmann)

September 21, 2023: Analyzing Open-ended (Audio) Survey Responses: Insights from a research project (Paul Bauer)