Modernizing data science workflows for clinical trials

How GSK modernized its statistical computing environment

Key Takeaways

Agile accelerates clinical trial reporting

GSK replaced waterfall with Scrum and Jira to increase delivery speed and flexibility in biostatistics workflows, enabling faster trial reporting and better response to changing regulatory and study needs.

Integrated platforms ensure compliance in biostatistics

Combining GitHub (version control) and Domino (statistical computing) creates a traceable, audit-friendly environment, ensuring pharmaceutical regulatory compliance for collaborative data science teams.

Organizational change enables statistical computing transformation

GSK’s platform shift required cultural change as well as tooling. Training and enablement helped statisticians transition from traditional methods to modern, collaborative workflows that support regulated clinical analysis.

Enabling scale and speed in clinical trials

GSK’s biostatistics team supports hundreds of clinical trials, with more than 1,300 statisticians and programmers responsible for analyzing trial data and delivering validated outputs for regulatory submission. Historically, this work was done using legacy systems and manual, waterfall-based workflows. These methods didn’t scale easily and couldn’t meet the growing demands of modern drug development.

Laying the foundation for scalable, auditable clinical reporting

GSK adopted Domino as the foundation for its statistical computing environment (SCE), integrating it with GitHub for version control and Jira for agile project management. Together, this ecosystem supports a streamlined clinical reporting workflow that can handle continuous data flow from trials, sometimes on a daily basis. Every phase is versioned and auditable, enabling full traceability across the lifecycle of a study. This redesign focused on improving traceability, reproducibility, and speed while maintaining the rigor required in a regulated environment.

Key phases of GSK’s clinical reporting workflow:

  • Pre-analysis: Ingesting raw clinical data
  • Analysis: Statistical programming and QC
  • Controlled execution: Validated and auditable outputs for submission

Daisy chains, not copy-paste

One of the most transformative changes was the move to a “daisy chain” Git branching model. In the past, analysts would often copy code between folders to begin a new milestone analysis. Now, outputs from earlier phases (such as a primary analysis) feed directly into the next, reducing duplication, ensuring consistency, and saving valuable time across multi-year trials. This model also made it easier to manage parallel ad-hoc analyses, such as when medical teams requested additional insights midway through a study. Rather than creating forks in isolation, teams could decide when and how to integrate ad-hoc work back into the main analysis chain.

Technology should not be limited by legacy processes. If we want to future-proof how we work, especially in a regulated environment, we have to let modern tools shape the way we collaborate, manage risk, and deliver faster with confidence.

Eleanor Sparling

Principal Data Scientist

Shifting from siloed coding to team collaboration

The transition wasn’t just technical. It was also cultural because most team members had no previous experience with Git or agile methodologies. Through focused enablement, workshops, and user-led communities, GSK helped its teams adopt task-based workflows, embrace pull requests, and reduce reliance on individual code ownership.

From double programming to risk-based QC

GSK also began to challenge long-standing industry norms around quality control. While double programming remains a standard approach, the team is exploring when structured code review may provide equivalent assurance with less redundancy. In parallel, they are experimenting with AI-powered copilots to assist with programming tasks and support the transition from SAS to open-source languages like R and Python.

Results and outlook

In under two years, GSK successfully migrated all clinical trials into GitHub-managed workflows. With version control and built-in audit trails, the team has adopted a more agile and reproducible way of working that meets evolving compliance expectations.

Catch up on all the videos from RevX 2025