Biomarkers and proteomics

Single Cell Proteomics in Biomarker Analysis

August 2021

The last decade has seen rapid development in our ability to test multiple biomarkers in individual cells. This is particularly the case with respect to DNA and RNA markers. The areas of genomics and transcriptomics have seen incredible development. However, while DNA and RNA analysis is essential for gene expression, a comprehensive single-cell analysis requires looking at other biomarkers as well – particularly proteins.

PROTEOMICS PROBLEM

Why proteomics? To put it simply, because proteins matter and we don’t know nearly as much about the proteome diversity at the single-cell level, as we do about genome and transcriptome.

Just as the transcriptome is not a simple copy of the genome, the proteome is also not a direct image of the transcriptome. In cells, RNA and protein copy numbers are not always the same. With only around 20,000 genes, and a couple of hundreds of thousands of mRNA molecules in mammalian cells, and millions of protein molecules in a single cell, there is too much variation in a cell not to look at it. Also, the same protein can have many proteoforms. Proteins can have different splicing and fusions. Post-translation modifications add another dimension to this variation. All this really speaks to the importance of looking directly at the proteome variation in a different cell.

The problem is that we don’t have the technology that would allow us to look at the proteome at the scale available for genome or transcriptome analysis at the single-cell level.

WHAT IS NEEDED

What would be the requirements for such technology? Many are in fact similar to what is needed in single-cell transcription analysis.

The essential requirement is sensitivity. For genes, this problem is solved by PCR. PCR amplifies DNA and RNA to easily detectable levels. However, there is no equivalent to PCR for proteins. Without the methods to amplify protein sample, it is rather complicated to work with proteins that can have a massive difference in the concentration in a single cell, from over half a million copies to less than ten copies. This technical challenge has been a big problem for proteomics studies for a long time. Not being able to amplify the target, protein analysis focuses on increasing the sensitivity of detection.

High content is the second requirement for single-cell proteomics. Single-cell proteome can have as many as a hundred thousand different proteoforms. It may not be necessary to capture that much information for every proteome but a good method should be able to capture a sufficiently large fraction of each proteome.

High throughput is also required. You need to be able to test many cells in a single run. This is both in the context of the number of the analyzed cells, but also in the context of spatial resolution when working on localization of markers in tissues and cells. The sample should not be confined to just a very small area.

Good methods will also have to be accurate. Instead of just four variables (i.e., four bases) as is the case with nucleic acids, protein analysis deals with twenty variables (i.e., amino acids) and a correspondingly higher chance of error.

It is very important to eliminate the possibility of a bias. Proteins can have a lot of biochemical variations (i.e., charge, size, hydrophobicity) due to the amino acid differences and also later modifications (e.g., phosphorylation, glycosylation). That may introduce a technical bias to the analysis. That bias may lead to under or over-representation of some species or variants.

Other considerations are not technical in nature but can be just as important for any technique that wants to have a wider impact.

a. It would need to be user-friendly. Technically complicated methods, that require a high level of specialization and training have a harder time getting wider adoption.

b. It needs to be cost-effective, both from the standpoint of the equipment setup cost and later cost per assay.

c. Time is the remaining requirement. Slow methods just don’t work well somewhere where high throughput is a needed. Assay time adds to the labor costs and labor cost is a factor when choosing what method to use.

STATE OF ART

Currently, the most common approach to look at the proteome is based on the mass spectrometry (MS). It characterizes the protein by measuring the mass of either whole protein or the mixture of shorter peptides. Recent technical improvements in this area are aiming at single-cell mass spectrometry. Currently, sensitivity level for the most part is still in the range of nanograms but new technical improvements promise to increase detection sensitivity. By using new and improved MS approaches, it is now possible to analyze as many as one thousand proteins from a single cell. While encouraging, this is still not quite at the high throughput level needed for proteomic research. In addition, MS still remains somewhat challenging approach. Not all proteins and peptides are equally ionizable and transmitted through a spectrometer. This can introduce bias in the analysis and possible error in data.

Another proteomics approach is based on immunoassay. Targets are detected with specific antibodies labeled with unique tags (fluorophores, enzymes, mass tags, oligo barcodes). Immunoassays have been used for many years, in methods like ELISA, immunohistochemistry, or cell sorting. Sensitivity is pretty good, at the femtogram level. This is achieved by different signal amplification methods. This approach also has its limitations, specifically when it comes to multiplexing protein markers in a spatial context. It is possible to increase the content by using sequential rounds of assays and unique tags that allow the identification of a single marker. This allows parallel analysis of up to a hundred markers in a spatially limited region of interest. The total assay time is long due to the sequential nature of the multiple testing rounds. By increasing rounds, the testing sample can be progressively degraded. This caps the number of possible total rounds and thus the total number of protein biomarkers that can be analyzed per assay. Additionally, immunological methods depend on the availability of antibodies. Antibodies can be costly and not always available, especially for new proteins or specific proteoforms.

NEW METHODS

New methods are being developed to circumvent the limitations of the current ones and provide a high content level of information that is needed for the investigation of the proteome at the single cell level. They promise to be able to detect proteins with single-molecule sensitivity and throughput levels similar to what is now done in single-cell transcription analysis. They are based on single-molecule protein sequencing and fingerprinting. The difference between the two is rather simple. Essentially, sequencing will identify the protein of interest by determining the exact and complete amino acid sequence. Fingerprinting is based on determining only the partial sequence and inferring the identity of the protein by comparison to the already know reference sequences. Either way, if you can identify individual protein molecules in a complex mixture of thousands of different proteins, you can get a good idea about the variation of protein expression from cell to cell. This is also a good way to quantitate protein levels without having to resort to reference controls. Some of those new technologies are already in commercial development.

One of those companies that are working on bringing those technologies to the market is Erysion. It is a startup that came out of the University of Texas in 2018. Their technology is based on what is known as fluorosequencing. It is essentially a modification of traditional Edman degradation sequencing. In Edman sequencing, amino acids are removed one-by-one, from the N-terminus of the protein and characterize. In Erysion’s approach, amino acids side chains are first chemically labeled with fluorescent tags. Labeled peptides are localized on the surface of the array. As the amino acid is removed from the peptide terminus, this event is detected by the drop in fluorescence output for a particular fluorescent channel. One by one, the peptides are sequenced and by overlapping the data of many peptides, the sequences of proteins are assembled. What makes this fingerprinting, rather than sequencing, is that you cannot differentially label the side chains of every amino acid. Only lysine, cysteine, tryptophan, and tyrosine can be labeled this way. This leaves gaps in the sequence and those gaps are filled by referencing the distribution of these four amino acids in the know reference protein sequences. Compared to mass spec, this approach is much more sensitive. It would allow single-molecule sensitivity from far less sample and at a much wideer range of individual protein concentrations.

Nautilus Biotechnology is another startup developing new method to sequence proteins. It was founded in 2016 by a group from Stanford University. Their approach to sequencing is very different. They use antibody binding to proteins to determine the protein sequence. Proteins are first immobilized on an array. They are repeatedly probed by antibodies and every round of binding is imaged. These antibodies do not recognize any specific protein. Instead, they bind to a short epitope, only three amino acids long. Antibodies are uniquely tagged so that every round of imaging adds a piece of information. This is repeated many times. The results are digitized and analyzed to decode the proteome. This approach also has the potential to allow single-molecule identification as well as quantification at a wide dynamic range.

QuantumSi was founded in 2015. Their technology is based on time-domain sequencing. Peptides are linked to the surface of a microwell array. Peptides are exposed to two types of molecules, recognizers and cutters. Recognizers bind specific terminal amino acids. They are labeled with fluorophores. Binding events are differentiated not so much by fluorescence color but by the difference in time of flight that can differ based on the type of binding interaction. Cutters remove the terminal amino acid and allow the next cycle of probing with the recognizers. Eventually, the entire peptide is analyzed and the complete protein sequence can be assembled.

Another approach is developed by Encodia. They use reverse-translation technology that turns peptide sequences into DNA. The DNA can be then be read by DNA sequencing. Recognition agents labeled with DNA encoding tags are binding N-terminal amino acids. Those tags are used to make a DNA library for sequencing.

Protein identification based on sequencing would have a number of advantages over the more conventional methods. The obvious ones are sensitivity and content. Single-molecule identification would allow high content analysis needed for the single-cell proteome. It would also allow accurate protein quantification, even when using much less sample.

Of course, methods based on protein sequencing have some limitations. They do not address the post-translational modification of proteins. Alternative sequencing methods, based on the adoption of nanopore sequencing might be better positioned to address this. Pore-based sequencing is still in early development. The focus here is on the accuracy of the readouts with new pore proteins.

CONCLUSION

In the next three to five years, we should expect to see several different technologies coming to the market. They promise to finally allow proteomics to catch up with genomics and transcriptomics. This will be very important in many areas, from early drug development to diagnostics.

Single Cell Proteomics in Biomarker Analysis

PROTEOMICS PROBLEM

Why proteomics? To put it simply, because proteins matter and we don’t know nearly as much about the proteome diversity at the single-cell level, as we do about genome and transcriptome.

The problem is that we don’t have the technology that would allow us to look at the proteome at the scale available for genome or transcriptome analysis at the single-cell level.

WHAT IS NEEDED

What would be the requirements for such technology? Many are in fact similar to what is needed in single-cell transcription analysis.

Good methods will also have to be accurate. Instead of just four variables (i.e., four bases) as is the case with nucleic acids, protein analysis deals with twenty variables (i.e., amino acids) and a correspondingly higher chance of error.

Other considerations are not technical in nature but can be just as important for any technique that wants to have a wider impact.

a. It would need to be user-friendly. Technically complicated methods, that require a high level of specialization and training have a harder time getting wider adoption.

b. It needs to be cost-effective, both from the standpoint of the equipment setup cost and later cost per assay.

c. Time is the remaining requirement. Slow methods just don’t work well somewhere where high throughput is a needed. Assay time adds to the labor costs and labor cost is a factor when choosing what method to use.

STATE OF ART

NEW METHODS

CONCLUSION

In the next three to five years, we should expect to see several different technologies coming to the market. They promise to finally allow proteomics to catch up with genomics and transcriptomics. This will be very important in many areas, from early drug development to diagnostics.

Show the site visitors what they should do next