It has often been said that searching for genetic clues to rare disease is like looking for a needle in a haystack. It’s even more difficult when you’re not sure what makes the needle different from the hay.

That was essentially the challenge facing researchers in rare disease genetics up until two years ago, when a team led by Daniel MacArthur, PhD, launched the Exome Aggregation Consortium (ExAC) database—a collection of 60,000 sequenced human exomes (the protein-coding portion of the human genome).

The database was by far the largest collection of genetic sequences that could be used as a point of reference for research into the genetics of rare disease. MacArthur, an investigator with the Center for Human Genetics Research at Massachusetts General Hospital, has a co-appointment at the Broad Institute of MIT and Harvard, where the browser was developed and launched.

Daniel MacArthur, PhD

Daniel MacArthur, PhD

Since its launch, the ExAC browser has been accessed well over six million times, and MacArthur estimates that the tool has assisted in the diagnoses of tens of thousands of rare disease patients.

The tool has also provided new insights into rare disease research and disease prognosis, and in some cases has identified new therapies for previously untreatable diseases.

In one published case, a researcher investigating a devastating inherited genetic disorder was able to tell one of his patients that she wasn’t carrying the gene variant that causes the disease. The woman’s mother had died from the disease and doctors had initially told her that she was also carrying the disease-causing variant.

“That’s transforming,” MacArthur said. “The ability to take a patient from what is effectively a terminal disease to having no more than a baseline risk... you can imagine the effect that would have.”

It’s a pretty impressive track record for a two-year-old project, and MacArthur and his team are just getting started.

From Genomes to Exomes

The human genome is the unique chemical code that guides our development. It is a massive collection of data that includes 3.2 billion bases of DNA. Stretched end to end, this sequence of letters would reach from Boston to Sedgewick, CO (about 200 miles outside Denver).

The exome is the 1-2% of the genome that codes for protein. This is where most of the disease-causing genetic mutations are found, so it is a good target for investigating what goes wrong in different diseases.

The trouble is that without a baseline reference to compare one exome against another, it’s hard to tell which variants are truly out of the ordinary.

“If we sequence a patient’s complete exome, what we find is somewhere between 30,000 and 50,000 genetic changes or variants,” MacArthur explains. “And in a given rare disease patient, only one or two of those variants will be responsible for their disease.”

When MacArthur first confronted this problem in his rare disease research back in 2012, he quickly determined that the existing resources were not up to scratch. There were some small collections of exome sequences, but none of them were large enough to help him separate normal variation from the unique.

The ExAC Browser is Born

MacArthur and a team of collaborators from the Broad Institute decided to tackle the issue head on. Over a period of 18 months, they compiled a massive database of exome information from more than 60,000 patients—all of which was contributed to the project by researchers from all over the world, including major contributions from Mass General researchers Sekar Kathiresan, MD, Mark Daly, PhD, Jose Florez, MD, PhD, and Ben Neale, PhD.

 “If you think about it, this is over $40 million worth of data that had been generated for a whole variety of projects, and the principal investigators were willing to push the results of that out to the public domain without any restrictions on use, which I thought was really cool,” MacArthur said.

Although there were some fits and starts along the way, the team was able to complete the browser in time to introduce it at the American Society of Human Genetics Meeting in October of 2014.

Due to its size and the variety of participants that it covers, the browser has turned out to be way more efficient for filtering than any of previous exome databases, MacArthur said. “We can often reduce the number of plausible changes that we find in a patient’s exome by a factor of five or 10 just by applying these filters.”

In October of 2016, MacArthur and team released a new data set that brings the total number of exomes available for searching to 126,000—essentially doubling the size of the original data pool. They have also added sequences for 15,000 full genomes to the database. This database, known as gnomAD, is currently in beta testing.

MacArthur, a self-described “evangelist” for data sharing, believes the world is moving towards a place where more researchers will be expected to make their results, data and methods as open and available as possible.

While he acknowledges that practical considerations—such as patient privacy and the need to sustain academic careers—may prevent others from collaborating as openly as those involved in the ExAC project, he hopes the same spirit of sharing will spread to other disciplines as well.

“I think we can overcome those challenges, and that science as a whole benefits enormously from having data available in an open a way as possible.”

Back to Top