Part II: It’s all about conversion
By now, you have probably read previous blog posts by Eric van der Walt and myself, in which we identified coverage uniformity as paramount in the “pursuit of NGS happiness”. To achieve the best coverage uniformity, we argued that it is critical to construct the library with the highest complexity and lowest duplication rate from the DNA available for sequencing.
Libraries created with different kits and protocols are, however, not equal by any means.
For Illumina sequencing, adapters carrying the motifs required for cluster amplification and (multiplexed) sequencing are added to the “random” DNA fragments that are to be sequenced. This can be achieved via PCR, tagmentation or ligation. PCR-based protocols are typically specialized (and often associated with amplicon libraries). Although tagmentation-based protocols are gaining popularity due to their speed and convenience, the workflow improvements typically come with costs to robustness and library quality (watch this space for future blogs on the limitations of tagmentation). For many applications, ligation-based library construction protocols are still considered the gold standard.
In short, ligation-based kits and protocols are all based on the same strategy:
- Sheared DNA is end-repaired to create 5’-phosphorylated, blunt-ended dsDNA molecules.
- These molecules are 3’-dA tailed (unless blunt-ended adapters are used in the next step).
- Adapters with 3’-dT overhangs are ligated to the A-tailed library fragments.
- If necessary, libraries are amplified by PCR to enrich for adapter-ligated molecules and generate a sufficient mass of library for the next step in the process (sequencing or target enrichment).
Libraries created with different kits and protocols are, however, not equal by any means. The percentage of input DNA converted to adapter-ligated molecules (from the same sample and input) can range considerably—from <0.5% to >50%. Conversion rate is the metric that ultimately dictates library complexity (diversity). It also determines how much library amplification will be required, i.e. impacts duplication rates. Although library amplification is equally important, I will focus on conversion rates in this post. To design and optimize a library construction workflow—particularly for low-input or challenging samples (such as FFPE, cell-free or ChIP DNA), it is important to be aware of the key factors that affect conversion rates, or the % of input DNA converted to sequenceable adapter-ligated library. These are the following:
1. Enzyme quality and formulation
Commercially available enzymes or enzyme mixes used for end repair, A-tailing and ligation are similar in nature and formulation. Most reputable vendors provide highly purified enzymes, but the amount of enzyme per reaction varies between kits. Reagents that are supplied as master mixes (enzymes and buffers already come premixed) are convenient to use, but are less stable than enzymes/enzyme mixes and reaction buffers supplied in separate tubes, and the performance of master mixes deteriorates more rapidly over time. Since specifications are not always available, the best bet is to pick reagents that are recommended for library construction from a wide range of inputs, and kits that offer the longest shelf life.
2. The library construction strategy (protocol)
Three strategies are commonly used to generate fragment libraries for Illumina sequencing:
- In “traditional” protocols, each enzymatic reaction (end repair, A-tailing and ligation) is performed in a separate tube, and followed by a cleanup step (using columns, or more commonly, SPRI beads). This type of protocol typically results in a considerable loss of material due to the physical transfer material from one reaction vessel to another.
- When employing a “with-bead” protocol (Fisher et al., Genome Biology 2011, 12: R1), the three consecutive enzymatic reactions are performed in the same tube, and SPRI beads are re-used for the cleanups after each reaction. Library diversity is preserved by reducing the losses associated with the transfer of material between reaction vessels.
- “Streamlined”, “rapid” or “fast” protocols are becoming increasingly popular. In these protocols, end repair and A-tailing are usually combined in a single reaction, performed with a 2-step temperature incubation. Ligation reagents and adapters are added to the product of the ER/AT reaction, without a cleanup in between. Removing the cleanups between enzymatic reactions can lead to significantly lower yields of adapter-ligated library (as compared to a well-optimized “with-bead” protocol). However, streamlined protocols can outperform all other protocols (especially with challenging samples) if the reaction chemistry and parameters have been carefully optimized.
3. Adapter:insert molar ratio
A molar excess of adapter is needed to ensure that most library fragments end up with adapters ligated to both ends. To achieve optimal results with higher inputs (≥100 ng) of good quality DNA and a “with-bead” protocol, an adapter:insert molar ratio of at least 10:1 is recommended. Conversion rates for low-input protocols and/or challenging samples can be improved with higher adapter concentrations. The optimal adapter concentration depends on the library construction chemistry and protocol. With the Solix LTP/HTP Library Preparation Kit (employing a highly optimized “with-bead” protocol), adapter:insert molar ratios in the range of 40:1 – 80:1 (maybe 100:1) have been shown to yield higher quality libraries from FFPE and cell-free DNA samples. With the streamlined, one-tube Solix Hyper Prep chemistry, adapter:insert molar ratios as high as 1000:1 have been demonstrated to improve library diversity when making libraries from low nanogram inputs.
At excessively high adapter:insert molar ratios (within the context of a specific chemistry) adapter-dimer formation dominates. Since an excess of adapter is required to ensure the highest possible ligation efficiency, post-ligation cleanups must be optimized to eliminate unused adapter and adapter-dimer prior to target enrichment, cluster amplification or library quantification.
4. Adapter design and quality
Two types of “universal”, “Y-shaped” or “forked” adapters are typically used in ligation-based library construction protocols for Illumina sequencing, namely “full-length” or “incomplete” adapters.
Full-length adapters carry all the sequence motifs needed for cluster amplification, indexing and paired-end sequencing, i.e. library molecules are flanked “P5” and “P7” flow cell oligo sequences after ligation. In contrast, incomplete adapters have shorter, non-complementary “arms”, and motifs needed for indexing and cluster amplification are added to one or both strands during a library amplification step (in target capture workflows, typically the post-capture PCR). Full-length adapters are required for workflows in which libraries are pooled for sequencing directly after ligation (PCR-free workflows), and for multiplexed target capture workflows (pre-capture pooling).
The molecular structure of full-length adapters is intrinsically an unstable one: two ~60 bp oligos are annealed by a region of complementarity constituting only about 20% of the length of each oligo. When using full-length adapters, it is therefore critical to ensure that adapters are duplexed and diluted in a buffered solution of the appropriate ionic strength, and that exposure to room temperature as well as the number of freeze-thaw cycles are limited.
Incomplete adapters have a more stable structure (a larger proportion of each oligo is annealed to the other) and typically result in higher conversion rates (everything else being equal), but should nevertheless be treated with similar care.
Whether or not you are using full-length or incomplete adapters, always ensure that oligos are of a high molecular purity (i.e. devoid of 5’- or 3’-truncated contaminants), and carry the appropriate functional elements to support TA-ligation and ensure maximum stability.
Some Illumina library construction protocols employ custom adapter designs. When considering the most appropriate strategy for your project, remember that blunt-ended ligation is typically less efficient than TA-ligation, and that “non-universal” adapters (two distinct adapter structures used during ligation) can result in so-called “AA” or “BB” (as opposed to “AB”) fragments, which cannot be sequenced. If that is the case, exceptionally high conversion rates have to be achieved to result in the same library diversity achievable when using TA-ligated, universal adapters .
5. Size selection
The length of a library fragment determines the size of the corresponding cluster generated during cluster amplification. Short molecules (including adapter-dimers) cluster very efficiently, whereas fragments >700 bp typically do not. Clusters that are too small or two big will not result in usable sequence reads, and therefore affect sequence capacity and coverage. For some sequencing applications, library fragments of a specific or uniform size is critical for sequence analysis.
When designing a library construction workflow, it is important to take these factors into consideration. Library diversity is best preserved if input DNA can be fragmented to a size distribution that is optimal for the sequencing read length and application. If this is not achievable, size selection (the removal of both small and large fragments) will have to be performed. Irrespective of whether size selection is performed with an electrophoretic device (e.g. the Pippin Prep from Sage Science), or using SPRI beads, it always results in a significant (60 – 95%) loss of material—not only because subsets of fragments are deliberately excluded, but also because DNA recovery from these techniques are intrinsically low. For low-input protocols in particular it is therefore important to determine whether size selection can be avoided. If not, the most appropriate stage for doing size selection (before end repair, after adapter-ligation or after library amplification) should be determined empirically for the specific sample type and library construction strategy used.
6. The amount of input material
Input DNA is “lost” at each step of the library construction process, as enzymatic reactions are not 100% efficient, and DNA recoveries from cleanups (if performed) are not perfect. Ligation is generally regarded as the most inefficient of the three enzymatic processes, and ligation efficiency decreases with decreasing DNA input. Low-quality DNA (such as FFPE samples), may not be end-repaired and/or A-tailed as efficiently as good-quality DNA, which further reduces the number of molecules available for adapter-ligation. When designing and optimizing library construction protocols for low-input (