9 October 2025
01 min read

Building Singapore’s Genomic Data Pipeline with PRECISE to Advance Precision Medicine

This is how Temus employed large-scale genomic data processing & cloud infrastructure to build a pipeline to process 100,000+ genomes at faster speeds

Temus’ Collaboration with PRECISE

Singapore’s National Precision Medicine Programme (NPM), led by Precision Health Research, Singapore (PRECISE), is one of the region’s most ambitious population genomics programmes. The collaboration between PRECISE Genomics Solutions Unit and Temus was to build data infrastructure capable of processing these genomes (~ 150TB total data volume) whilst dramatically reducing processing time from weeks to days and preparing for petabyte-scale data management.

At the heart of this work lay an opportunity to optimise a sophisticated data infrastructure that required specialised expertise in genomics as well as cloud computing, and large-scale data processing from the technology enabler. Temus was selected as the technology partner to design and implement the critical data pipeline infrastructure that would enable PRECISE to process, analyse, and share genomic data at unprecedented scale. Throughout the development process, TEMUS worked closely with PRECISE’s genomics experts to ensure the pipeline met stringent scientific and regulatory requirements whilst maintaining the flexibility needed for diverse research applications.

As of August 2025, we have built a private and scalable pipeline comprising multiple bioinformatic tools to process about 100K people’s genomes data into summary statistics in just 2 weeks, enabling population-specific insights that could uncover causal biology. This achievement was made possible through the tight integration between TEMUS’s technical capabilities and PRECISE’s deep genomics expertise, creating a truly collaborative approach to solving complex computational challenges in precision medicine.

Photo below: Asst Prof Max Lam (standing) sharing thoughts on the PRECISE initiative with the wider Temus AI and Data and Health teams in a brown bag session.

What is Large-Scale Genomic Data Processing and Why is it Important?

Large-scale genomic data processing involves the systematic analysis of DNA sequences from thousands to hundreds of thousands of individuals simultaneously. Each human genome contains approximately 3 billion base pairs, generating 100-200 gigabytes of raw data per individual. At population scale, this creates massive computational challenges requiring:

  • Parallel processing of thousands of genomes simultaneously
  • Quality control pipelines ensuring data accuracy across millions of genetic variants
  • Association analysis identifying genetic patterns linked to diseases and traits
  • Secure data management protecting sensitive genetic information
  • User-friendly  visualisation enabling researchers to explore population-wide genetic insights

Large-scale genomic data processing is revolutionising healthcare and medical research by enabling precision medicine, accelerating drug discovery or supporting population health initiatives including preventive care. For Singapore specifically, processing 100,000 genomes provides unprecedented insights into Asian genetic diversity, supporting the development of treatments optimised for local populations and establishing the foundation for personalised healthcare systems.

How Did We Do?

  • Successfully processed 100,000+ whole genome sequences (~ 150TB total data volume)
  • Single analysis Nextflow pipeline that chain PLINK, PLINK2, REGENIE, METAL workflows and more into a configurable, scalable, and interoperable pipeline.
  • Interactive analysis capabilities allowing exploration of 100,000+ genome association studies

Temus brought the specialised cloud computing and large-scale data processing expertise we needed to build infrastructure capable of handling population-level genomics. Their team worked closely with our genomics experts to design and build a pipeline that met both our stringent scientific requirements and regulatory standards. The result speaks for itself: we’ve dramatically reduced processing time from weeks to days for 100,000 genomes—approximately 150TB of data—unlocking population-specific insights into Asian genetic diversity that are critical to Singapore’s National Precision Medicine Programme. This integration of technical capabilities and our deep genomics expertise exemplifies the kind of public-private collaboration essential to advancing precision medicine innovation and translating genomic research into tangible healthcare benefits for Singaporeans.

 

Asst Prof Max Lam

Chief Technology Officer

PRECISE

What We Did

Temus engineered and deployed scalable data pipelines leveraging AWS’s processing capabilities:

Infrastructure Development and Scale:

  • Using Hail, an open source framework for exploring and analyzing genomic data, natively integrated with AWS EMR, enabling parallel processing of thousands of genomes simultaneously
  • Elastic scaling architecture automatically adjusting compute resources based on data volume
  • Genomic analysis tools containerised using Docker for consistent, reproducible processing across massive datasets
  • AWS ECR integration ensuring seamless deployment of custom genomic algorithms at enterprise scale

Fast Pipeline Development:

  • Custom Nextflow pipelines optimised to the needs of population genomics programmes
  • Automated workflow orchestration handling 100,000+ whole genome -wide summary statistics.
  • Integration of genomic datasets with population-specific Singapore data

Advanced Analytics & Visualisation:

  • PheWeb integration optimised for massive datasets, enabling interactive exploration of population-wide genetic associations
  • Distributed computing architecture allowing multiple researchers to simultaneously analyse different genome cohorts

What’s Next

Having successfully reached the 100,000 genome milestone, we’re actively developing next-generation optimisations including:

  • Scaling up processing capabilities for future population studies
  • Agentic AI-powered software engineering to advance other niche research areas
  • Real-time variant discovery enabling immediate clinical insights from new sequences

Help establish Singapore as the genomic data processing hub for Southeast Asia, with opportunities for regional expansion supporting pan-Asian genomic initiatives and establishing new standards for population-scale genomic research.

 

Talk to us

We can apply this same hyperscale processing architecture to your industry’s most demanding data challenges. Whether you’re processing millions of financial transactions for real-time fraud detection, analysing vast manufacturing sensor data for predictive maintenance, managing global supply chain logistics, or processing satellite imagery for precision agriculture, our proven ability to handle petabyte-scale datasets with parallel processing, intelligent resource allocation, and automated workflow orchestration can reduce your processing time from weeks to hours whilst lowering computational costs and enabling real-time decision-making capabilities.

Browser not supported

Modern websites need modern browsers

To enjoy the full experience, please upgrade your browser

Try this browser