Consider SNPs when designing PCR and qPCR assays

The rapidly increasing number of known SNPs

Next-generation sequencing (NGS) has led to a dramatic increase in identified SNPs. SNPs can pose a problem when they underlie primer or probe sequences used in PCR/qPCR. Learn what effect they can have and how you can minimize their impact on your PCR assays.

Widespread adoption of next generation sequencing (NGS) has led to an exponential increase in cataloged sequence data. One consequence of this has been a dramatic increase in the overall number of identified single nucleotide polymorphisms (SNPs; see sidebar, SNPs—defined, below). As of November 7, 2016, Build 149 of the NCBI dbSNP reference database listed 558 million submitted SNPs (subSNP) for Homo sapiens, of which 154 million were referenced (refSNP) [1]. This represents a >19X increase in the number of subSNPs over 10 years (28 million subSNPs in 2006, Build 126); and an ~13X increase in the number of refSNPs (Figure 1).

human SNPs

Figure 1. Dramatic increase in the number of human SNPs over the past 5 and 10 years.
* NCBI dbSNP Build 149 (Nov 7, 2016); www.ncbi.nlm.nih.gov/dbvar/content/org_summary/ (accessed Dec 19, 2016).
www.ncbi.nlm.nih.gov/projects/SNP/snp_summary.cgi (accessed Dec 19, 2016).
‡ refSNP, or reference SNP cluster, is defined as a SNP or group of SNPs that map to a specific genomic sequence region. The SNPs of an existing build are all refSNPs. In creating a new build, the refSNPs from the prior build and new subSNPs are both compared to updated genome sequence data to minimize duplications among refSNPs and subSNPs. This process will assign subSNPs to existing refSNP clusters or new refSNPs.
§ subSNP stands for “submitted SNP” and is defined as a SNP submitted since the last build that was found to be distinct from refSNPs after multiple cycles of BLAST analyses.

Based on the number of refSNPs in Build 149, and a genome size of 3.4 x 109 bp [2], the human genome should contain a SNP approximately once every 22 bases. Other common model systems show a similarly high frequency of SNPs (Table 1).

Table 1. High frequency of SNPs in common model systems.
Species NCBI dbSNP build* subSNP (million) refSNP (million) Genome size (bp)§ SNPs per base
Homo sapiens (human) Build 149 (Nov 7, 2016) 557.9 154.2 3.40 x 109 1 in 22
Bos taurus (cow) Build 148 (Jun 24, 2016) 293.8 100.2 3.62 x 109 1 in 36
Mus musculus (mouse) Build 146 (Nov 24, 2015) 135.7 80.4 3.23 x 109 1 in 40
Sus scrofa (pig) Build 145 (Jul 31, 2015) 135.5 60.4 3.13 x 109 1 in 52
Drosophila melanogaster (fruit fly) Build 148 (Jun 24, 2016) 5.2 5.2 0.176 x 109 1 in 34

* Taken from NCBI dbSNP; www.ncbi.nlm.nih.gov/dbvar/content/org_summary/(accessed Dec 19, 2016).
† subSNP stands for “submitted SNP” and is defined as a SNP submitted since the last build that was found to be distinct from refSNPs after multiple cycles of BLAST analyses.
‡ refSNP, or reference SNP cluster, is defined as a SNP or group of SNPs that map to a specific genomic sequence region. The SNPs of an existing build are all refSNPs. In creating a new build the refSNPs from the prior build and new subSNPs are both compared to updated genome sequence data to minimize duplications among refSNPs and subSNPs. This process will assign subSNPs to existing refSNP clusters or new refSNPs.
§ Gregory, T.R. (2005). Animal Genome Size Database; www.genomesize.com (accessed Dec 19, 2016); where genome size (bp) = (0.978 x 109) x DNA content (pg)

Taking SNPs into account when designing PCR/qPCR assays

Given the high frequency of SNP occurrence, it is unrealistic to try to avoid SNPs altogether when designing your PCR/qPCR assays. However, it is important to consider their positioning, if located within a primer or probe sequence. Performing PCR using primers and probe sequences that overlie SNP sites can either dramatically impact a reaction or can have little to no impact at all. Specifically, the position of SNPs underlying a primer or probe can influence primer and probe melting temperature (Tm), efficiency of polymerase extension (non-proofreading polymerases like Taq), and even target affinity. To obtain the best data, it is important to know how your assay designs overlie SNPs and manage this positioning.

Positional effects. SNPs that occur in primer and/or probe binding sites can destabilize oligonucleotide binding and reduce target specificity. Mismatches can affect the hybridization of oligos, reducing the Tm of an oligonucleotide by as much as 5–18°C (Figure 2). The degree of effect on Tm depends on the mismatch position, type of mismatch (e.g., A/A, A/C, G/T), and the surrounding environment/sequence [3]. When probes hybridize, the destabilizing effects are highest for mismatches located in the interior of the duplex [4–6]. Mismatches at the terminus or penultimate position (1 or 2 base pairs from the terminus) are less discriminatory [4,7]. Use the free, online IDT OligoAnalyzer™ Tool to make such predictions.

D-PCR17CC-considersnps-F2
Figure 2. Significant decrease in probe or primer melting temperature from a single mismatch. The example shows how a single mismatch can alter probe or primer melting temperature, affecting the efficiency of the PCR and, ultimately, the interpretation of experimental results. These particular mismatches create non-standard base pairing that should not disrupt the helix. However, a single mismatch can substantially decrease melting temperature—by over 8°C (compare the Tm values highlighted with green and red arrows). The screen shots show output from the free, online OligoAnalyzer® Tool.

 SNPs underlying primers can exert additional positional effects on polymerase binding. (Note that the OligoAnalyzer tool does not take polymerase destabilization effects into account.) In their article entitled Single-nucleotide polymorphisms and other mismatches reduce performance of quantitative PCR assays [8], Lefever and colleagues demonstrated that the impact of a given mismatch correlated with its distance from the 3’ end of primers. Those mismatches closest to the 3’ end—typically within the last 5 bases—had the most dramatic effect on amplification. Mismatches at the terminal 3’ base had the strongest shift of Cq from perfect matches, altering Cq by as much as 5–7 cycles (a 32- to 128-fold difference, dependent on the master mix used; Figure 3).

CC-SNPS Fig3
Figure 3. Mismatches at the 3’ end of primers reduce qPCR performance. The data show the difference in Cq (ΔCq) between perfect match and mismatch primers as a function of the position of a single mismatch, using 5 different master mixes (A, B, C, D, and E). p values were calculated using one-way analysis of variance (ANOVA). The shift due to a SNP at the 3’ end of a primer varies up to 7 Cq, representing a 128-fold change in gene expression, dependent on the master mix used. (Data adapted from Lefever et al. [3], with permission of the publisher.)
 

Base composition effects. Lefever and colleagues [8] also showed that reactions containing purine/purine and pyrimidine/pyrimidine mismatches at the 3’ terminal position in the primer produced larger Cq values (mismatch vs. perfect match) and reduced end-point fluorescence values, with A/G and C/C showing the largest Cq differences compared to perfect matches.

Their data demonstrated that the shift in Cq between a perfect-matched oligo/target and an oligo/target with a single mismatch decreased with increasing distance of the mismatch from the 3’ end [8]. Single mismatches located more than 5 nucleotides from the 3’ end could still have a moderate effect on qPCR amplification. Further experiments by this group showed that the reduction in Tm and shift in Cq were exacerbated when SNPs were present in both primers (forward and reverse) or when more than one mismatch occurs within a given primer [8].

The free, online OligoAnalyzer tool allows researchers to set mismatches and then calculate Tm. Users can examine potential hairpin and dimer formation using this tool. The DECODED article, Determining the physical characteristics of your oligos—The OligoAnalyzer Tool, provides guidance on how to identify these characteristics.

Effect on qPCR amplification. In many cases, a single SNP may not prevent amplification but can cause inefficient annealing and amplification [4]. This can lead to a delay shift in Cq and an underestimation of the amount of gene expression or even copy number loss in SNP-containing sequences.

Using a modified single-base extension assay, Wu and colleagues [9] investigated how the type and position of a mismatch affected extension efficiency during the initial PCR cycle. They concluded that mismatches within the last 3–4 bases of the 3’ end of the primer blocked primer extension. Wu et al. attributed the low extension efficiency to reduced binding of the DNA polymerase. While other research groups have contested this finding, describing a similar affinity of DNA polymerase for correctly paired and mispaired duplexes [10], Lefever and colleagues [8] confirm and extend the results from Wu et al.

Safeguard your experiments

Researchers often adopt primer and probe sequences identified in prior publications. It can be tempting to use legacy published or “lab-validated” RT-PCR assay designs. However, given the continual addition of new sequence information, it is important to reevaluate and understand the location of SNPs relative to primer and probe sequences in your PCR/qPCR assays. The following are tips for managing SNP impact on your assay results:

  • To obtain an up-to-date list of possible SNPs in your sequence, scroll down to the Alignments section of your BLAST search results page, and click on Graphics at the top left. At the top right of the sequence graphic, click on Tracks and select the Variation tab. From there you can select the type of SNPs for which you want information.
  • If the “rs” number—the Reference SNP cluster ID (accession number) that refers to a specific SNP—is known, check SNP information in NCBI dbSNP.
  • If a SNP is identified, check whether the frequency of the SNP (minor allele frequency, or MAF) is relevant in your population.
  • When you cannot avoid a SNP underlying your probe sequence, use the free, online IDT OligoAnalyzer Tool to predict the Tm of mismatched probe sequences.
  • In cases where a SNP underlies a primer sequence, minimize SNP effects by positioning the SNP towards the 5’ end of the primer. For help with such designs, contact us.
  • For genotyping experiments where relevant SNPs occur adjacent to your SNP of interest, avoid allele dropout by using mixed bases (Ns) or inosines in the primer or probe to cover the adjacent site(s). Since genomic information is constantly in flux, it is important to recheck previously used primer and probe sequences for underlying SNPs.

Adopting a new paradigm in assay design

SNPs are now a regular occurrence, with more discovered every day. It is no longer practical, or even possible, to avoid them when designing PCR/qPCR assays. This means we must adjust our thinking about experimental design, and design our PCR/qPCR assays intelligently, with SNPs in mind.

SNPs—Defined

Single nucleotide polymorphisms or “SNPs” are the single base positions within a stretch of DNA that differ in sequence among individuals or a population. They define distinct alleles or mutations, and can occur within coding or noncoding sequences. The effect of a SNP can be variable, and they occur at different frequencies and within different populations. When they occur within coding sequences, they do not always affect the translated amino acid sequence—that is, some SNPs are silent. However, even when SNPs occur outside coding regions, they may still affect gene expression by altering gene splicing, transcription factor binding, or mRNA degradation.

SNPs play an extremely valuable role in evolving species diversity and in vitro as a tool for individual and species identification. Human beings share 99.9% of our genetic code. Of the 0.1% differences among us, >80% are due to SNPs [11]. Other sequence variations include insertions, deletions, and transpositions.

References

  1. NCBI Variation Summary. www.ncbi.nlm.nih.gov/dbvar/content/org_summary/. Accessed Dec. 19, 2016.
  2. Animal Genome Size Database. 2005. www.genomesize.com/. Accessed Dec 19, 2016.
  3. Owczarzy R, Tataurov AV, Wu Y, et al. IDT SciTools: a suite for analysis and design of nucleic acid oligomersNucleic Acids Res. 2008;36(Web Server issue):W163-169.
  4. Letowski J, Brousseau R, Masson L. Designing better probes: effect of probe size, mismatch position and number on hybridization in DNA oligonucleotide microarrays. J Microbiol Methods. 2004;57(2):269-278.
  5. You Y, Moreira BG, Behlke MA, et al. Design of LNA probes that improve mismatch discriminationNucleic Acids Res. 2006;34(8):e60.
  6. SantaLucia J, Jr., Hicks D. The thermodynamics of DNA structural motifsAnnu Rev Biophys Biomol Struct. 2004;33:415-440.
  7. Urakawa H, Noble PA, El Fantroussi S, et al. Single-base-pair discrimination of terminal mismatches by using oligonucleotide microarrays and neural network analyses. Appl Environ Microbiol. 2002;68(1):235-244.
  8. Lefever S, Pattyn F, Hellemans J, et al. Single-nucleotide polymorphisms and other mismatches reduce performance of quantitative PCR assays. Clin Chem. 2013;59(10):1470-1480.
  9. Wu JH, Hong PY, Liu WT. Quantitative effects of position and type of single mismatch on single base primer extension. J Microbiol Methods. 2009;77(3):267-275.
  10. Huang MM, Arnheim N, Goodman MF. Extension of base mispairs by Taq DNA polymerase: implications for single nucleotide discrimination in PCR. Nucleic Acids Res. 1992;20(17):4567-4573.
  11. Piazza A. Theory of evolution and genetics. In: Fasolo A, ed. The Theory of Evolution and Its Impact: Springer, Milano; 2012.

For research use only. Not for use in diagnostic procedures. Unless otherwise agreed to in writing, IDT does not intend these products to be used in clinical applications and does not warrant their fitness or suitability for any clinical diagnostic use. Purchaser is solely responsible for all decisions regarding the use of these products and any associated regulatory or legal obligations. Doc ID: RUO22-1592_001

Published Jan 31, 2017
Revised/updated Feb 17, 2023