HumanCode (Helix)
How Crafted’s CTO Ryan Trunck Built a Machine Learning Model to Launch One of the Leading Personal Genomics Apps
Overview
HumanCode, founded in 2017, used next-generation DNA sequencing technology to deliver personalized insights to help people improve their lives. Due to its rapid success, HumanCode was acquired by Helix in 2018, which is now the leading population genomics company and has been named one of the Top 100 Healthcare Technology Companies.
Personal genomics were all the rage in 2017, but machine learning (ML) models and development infrastructure were nowhere near as sophisticated as they are today. Read on to learn how the HumanCode team, which included Crafted’s Ryan Trunck, built a computational biology ML model that went from training on genetic composition in 24 hours to 30 minutes, and brought inference time down from 30 to three minutes.
Challenges & Goals
The HumanCode team’s ultimate goal was to get more people interested in science and their genetic makeup by making DNA understandable to the average consumer. To do this, they set out to build an application that would unlock the value of the human genome and empower individuals to improve their health outcomes. However, there were many challenges they needed to overcome, from both a scientific and technological standpoint as well as a business perspective.
For starters, they needed to build out the science and framework of a computational biology pipeline running many models. When dealing with datasets of thousands to millions per user, not any pipeline will do. And training a machine learning model to predict the composition of an individual’s ancestry and other traits isn’t easy. Typically in machine learning, you want the number of examples to be significantly higher than the number of inputs. In genomics, it’s the opposite; there are significantly more input variables than examples. This presented the team with a dimensionality problem.
Another big challenge the HumanCode team faced was that there was minimal ML tooling available at the time. Training a four-layer neural network using a central processing unit (CPU) becomes incredibly inefficient and time-consuming, especially when you consider that there are billions of single nucleotide polymorphisms (SNPs, pronounced “snips”) in a person’s genome. That’s a lot of data for a CPU to process.
Needless to say, bringing science to production is harder than it seems. Science is a rapidly changing field, and even ideas from research papers need to be constantly adapted and thought about differently. Biotechnology is no different, requiring regular iterations and a constant focus on data compliance. If a product provides medically significant results, then it also needs to be HIPAA compliant.
Lastly, there were business drivers that the team needed to take into account, such as the seasonality of selling personal genomics. Around 80% of genomics sales happened in Q4 around Black Friday, with the rest occurring around Mother’s and Father's Day. The timing of releasing an application as well as pricing it competitively were important considerations, as products like 23andMe had skyrocketed in popularity.
Approach
Assembled a diverse team—from computational biology, to dev ops, and application development—to help build out the science and framework of a computational biology pipeline
Built BABYGlimpse, which genome-sequenced two people and generated the probable attributes of their baby
Used Monte Carlo simulations per trait model, which analyzed the pertinent SNPs per model and all of the probable ways they could combine
Predicted hair color, eye color, height, etc.
Decided to pivot away from BABYGlimpse as it wasn’t driving intended business value
It got a lot of press, but didn’t drive desired transaction volume
Expensive price tag just to find out what your baby could potentially look like
Redirected efforts towards a more general application, DNAPassport, focused on the genomics, ancestry and traits of individuals users
Learned people are very fascinated in themselves and where they come from
This included identification of which certain SNPs predispose you to certain conditions or genetic advantages
Can calculate propensity for Alzheimer’s, fast-twitch muscles, perfect pitch, academic performance, etc.
Inspired by the Thousand Genomes Project, built a ML model that predicted the composition of an individual’s ancestry
CPU took 24 hours to train the model and was hard to iterate on
Switched to a graphics processing unit (GPU), which got the training cycle down to 30 minutes, a 16x increase in training interaction
Ultimately went with a custom ancestry microservice running in Google Cloud (GCP) that communicated with the genomics server via gRPC, which got the training cycle down to 2-3 minutes
Outcomes
After countless iterations, the HumanCode team was able to get their ML model to process genetic ancestry composition and traits across many models in two to three minutes. As a result, the DNAPassport application performed incredibly well and was regularly competing with the National Geographic app for the number one spot on app charts. This caught the attention of Helix, a rapidly-growing, next-generation population genomics company. In 2018, Helix acquired HumanCode to help further their mission of making science consumer-digestible.
HumanCode’s DNAPassport application helped people to learn the story of where they came from and where their roots were scattered across the world, leading to a better understanding of genetic contribution to human health and disease. This armed users with better data and insights that they could actually use to make healthy lifestyle choices, which will ultimately improve health outcomes and accelerate research.
Conclusion
Ryan brought his experience to Crafted, where we’ve helped companies of all shapes and sizes (including other healthtech companies) drive tangible business outcomes. Whether your organization needs help shipping high-quality software fast, or you’re looking to learn ML best practices from our team of seasoned experts, reach out to the Crafted team today!