To understand how cellular machinery functions, scientists have looked to an organism’s DNA. But genome sequence has not proven to be the complete instruction manual that researchers had hoped for.

“Now that we have hundreds of thousands of genome sequences, the new picture that emerges is confusing,” says Marc Vidal of Harvard Medical School. Genetic mutations do not always predict a cell’s function or an organism’s health. For Vidal and others, the major missing piece in cellular systems biology is the interactome: a map of all the protein interactions in a cell.

A dynamic network of protein complexes, large and small, helps to carry out genetic instructions. Transcription factors bind to DNA and activate genes, the ribosome reads RNA and assembles proteins, and kinases and phosphatases modify proteins to regulate cellular signals.

This technique allows you to do replicates at a scale not possible before.

—Shelly Trigg, University of Washington

“Depending on the state of the network, genetic mutations have very different effects,” Vidal says. In the future, he thinks, interactome maps for different cells within an organism could collectively be the Rosetta Stone that helps decode the relationship between genetic form and function.

Refined mapping techniques developed in yeast models or based on mass spectrometry can now screen tens of thousands to millions of protein interactions. Each approach provides a different view of an organism’s interactome, shining a light on a slice of it. With the right combination of maps, Vidal asserts, a complete picture can emerge.

As researchers build out a specific interactome, individual proteins within it whose functions are unknown can be put into biologic context based on what’s known about how their neighbors function, says Ed Huttlin of Harvard Medical School.

The Scientist talked with interactome experts to learn more about three common techniques for mapping protein interactions on a large scale.

Direct Interactions

RESEARCHER: Shelly Trigg, systems biologist and postdoc in the lab of Steven Roberts, University of Washington

ORGANISM: Arabidopsis thaliana

TECHNIQUE: Cre-reporter–mediated yeast two-hybrid coupled with next-generation sequencing (CrY2H-seq) While working in Joseph Ecker’s lab at the Salk Institute for Biological Studies as a graduate student, Trigg wanted an economical, technically straightforward assay for screening large numbers of interactions among plant proteins. She and her colleagues turned to the yeast two-hybrid (Y2H) assay, which has been used to create large-scale interactome maps for plant, human, yeast, and bacterial cells. This assay involves expressing two proteins in genetically modified yeast that grow only if the two proteins directly interact.

Here’s how it works: Researchers engineer different yeast strains carrying plasmids that encode specific test proteins. Then, they mate pairs of strains so that a given yeast cell will make two introduced proteins. If the two proteins interact inside the cell, they activate a gene to produce a nutrient missing from the yeast culture. Growing cells signal an interaction, and sequencing the genetic instructions from the plasmids within those cells reveals which two proteins communicated.

In 2011, a consortium of researchers used this technique to map the Arabidopsis interactome, consisting of proteins encoded by roughly 8000 genes. To speed up cell culture and screening, they mated yeast cells in 96-well plates, with each well testing interactions between one protein and a collection of 192 others. The researchers found thousands of pairwise interactions among 2,700 proteins.

In 2017, Trigg and her colleagues sped up screening even further by mixing all the engineered yeast cells in the same culture at the very beginning of the assay. To easily identify protein interactions in growing cells, she re-engineered the yeast so that interacting proteins caused cells to produce an enzyme called Cre recombinase, which permanently joined the DNA encoding the introduced proteins into one sequence.

Trigg scraped all the growing cells off the plate, extracted the plasmids with the joined genes, amplified them with PCR, performed next-generation sequencing to determine which genes were linked, and used a bioinformatics pipeline to identify protein interactions by finding sequences of joined genes and eliminating false positives (Nat Methods, 14:819–25, 2017).

FINDINGS: The researchers screened 3.6 million unique binary interactions and identified a network of 8,577 interactions among 1,453 transcription factors. In general, Y2H assays require replicate experiments to eliminate false positives produced by quirks of yeast genetics. “This technique allows you to do replicates at a scale not possible before,” Trigg says.

Even with 10 replicates, though, Trigg and her colleagues only captured about 50 percent of the possible interactions. However, many of the interactions they saw were new, and the resulting network contained connections between unexpected classes of proteins, Trigg says.

PROS:

  • Of the three most common large-scale interactome mapping techniques, the Y2H assay is the only one that can specifically identify direct interactions.

CONS:

  • Introduced proteins can cause false positives (because their overexpression can generate nonspecific interactions) and false negatives (because the yeast nucleus is an unnatural environment for these interactions), so these experiments require many controls.

Systematic Screens

RESEARCHER: Ed Huttlin, proteomics researcher, Harvard Medical School

ORGANISM: Human

TECHNIQUE: Affinity-purification mass spectrometry (AP-MS)

Huttlin and his colleagues are working to systematically map interactions among the 20,000 or more possible proteins in a typical human cell. The researchers have developed a high-throughput method for identifying protein interactions using a technique called affinity-purification mass spectrometry. They set out to trace interactions from as many as 500 different proteins each month, and their interactome map has now grown to include proteins from about half of the genes in a typical human cell, Huttlin says.

First, the researchers create a tagged protein to serve as the bait that pulls down protein complexes. They start with a library of cloned human genes, called the ORFeome. Then they put a gene from this library into a plasmid and insert the plasmid into cultured human embryonic kidney cells. The plasmid also contains genetic instructions for a short peptide derived from noneukaryotic cells that acts as a handle for purification. This means that when the cells overexpress the protein encoded by the introduced gene, the protein also carries the peptide tag dangling from its C-terminus.

The researchers lyse the cells and expose their contents to an antibody that binds to the peptide tag. The antibody pulls the tagged protein, along with others interacting with it, out of the cell. Next, the protein complexes are digested into peptides and the fragments injected into a mass spectrometer. An algorithm uses the mass spec data to reassemble the peptide sequences and identify each protein in a complex. Finally, the researchers analyze the amount of signal from each protein to distinguish specific interactions from the protein background remaining from the purification.

While this method identifies both direct and indirect interactions, it cannot distinguish between the two. In direct interactions, two proteins touch each other, and in indirect interactions, a protein might be separated from the tagged one by several others in a complex. On average, a protein in their screen has about 10 interactions, Huttlin says.

FINDINGS: In 2017, Huttlin and colleagues mapped more than 56,000 interactions in human embryonic kidney cells, capturing more than 25 percent of the protein-coding genes in the human genome and creating the largest human interactome network to date (Nature, 545:505–9, 2017). Their map revealed more than 29,000 previously unknown associations.

The researchers used their map to draw a network of proteins associated with cellular fitness. They also used a database of disease-linked genes to draw interaction networks for those illnesses.

PROS:

  • Interactions come from proteins expressed in their native environment.
  • Overexpressed tagged proteins can pull out complexes of relatively low abundance, which are generally hard to detect with mass spec proteomics methods.

CONS:

  • The method generally cannot distinguish between direct and indirect interactions.
  • The method struggles to identify transient interactions that fall apart during purification.
  • Size or toxicity might prevent cells from expressing some tagged proteins.
  • The peptide tag could potentially interfere with complex formation.

Network Dynamics

RESEARCHER: Leonard Foster, biochemist, University of British Columbia

ORGANISM: Mouse

TECHNIQUE: Protein correlation profiling (PCP) and stable isotope labeling of mammals (SILAM) Foster and his colleagues wanted to capture changes in protein interaction networks on a large scale, a feature not possible with Y2H and AP-MS, which capture a snapshot of the interactome. They used their approach to compare interactomes of cells within different mouse tissues. (Previous work had looked only at one type of tissue or cell.)

The predictions, commonly used to discover disease biomarkers, were roughly as accurate as random guesses.

To do this, the researchers sorted protein complexes by size, then used mass spectrometry to identify and quantify them. Their profiling technique capitalizes on the idea that proteins that travel together during chromatography or electrophoresis are likely part of the same complex. The isotope label, incorporated in a 13C-labeled arginine fed to the mice, provides a way to measure the amount of each complex relative to the entire protein content.

The researchers collected heart, brain, skeletal muscle, lung, kidney, liver, and thymus tissue from mice. Then they used chromatography to separate the protein complexes in each tissue by size. They collected 55 fractions from each column and measured the protein levels in each fraction using mass spectrometry. An algorithm compared the levels of each protein in each fraction to identify proteins that migrated together.

FINDINGS: From the seven different tissues, the researchers identified 38,117 interactions, about 70 percent of which were new (bioRXiv, DOI:10.1101/351247, 2018).

Few tissue-specific interaction maps come from experimental measurements. Most come from modeling, using tissue-specific gene and protein expression to anticipate how an organism’s general interactome might change in that tissue. But when Foster’s team compared their experimentally measured tissue-specific interactomes to predictions, the predictions, commonly used to discover disease biomarkers, were roughly as accurate as random guesses. “Our data show that approach [to predicting interactions] is not reliable,” Foster says.

PROS:

  • The isotope label enables researchers to easily track how interactions change in response to different stimuli.
  • Native proteins isolated from tissue reflect complexes formed under physiologically relevant conditions.
  • Protein correlation profiling makes it possible to measure many different interaction networks in different tissues, a task too laborious for affinity-purification mass spec approaches and impossible in tissue-free yeast.
  • Based on his experience with affinity-purification mapping, Foster estimates this PCP approach is 50 times faster than affinity-purification mass spec.

CONS:

  • The method cannot distinguish between direct and indirect interactions.
  • It also can’t separate the identity of two complexes that co-migrate and does a poor job of finding low-abundance proteins.