The witchcraft of knockouts

Friday 27 May 2016

The witchcraft of knockouts

Making knockouts has a bad rep, but there seems to be a conspiracy afoot to make it feel like a mystic art.




The tool of the trade is the recombinase of λ phage and that itself is confusing in nomenclature. The complex, called Red, consists of three peptides encoded by exo, bet and gam, possibly the strangest gene symbols. exo was the α until it go renamed and it is unclear why the Greek letters had to be truncated to three letters as the rest of the genes of λ phage are not Greek-themed (sadly) or obey common gene symbol rules. The recombinase is called Red because it was thought a single gene, red, which plays a role in recombination after UV-damage.

pKD46

The first researchers to use it properly were Datsenko and Wanner in 2000. In their PNAS paper they give the world some great plasmids, pKD3, pKD4, pKD13 and most importantly pKD46 and pCP20 —okay, the pCP20 was not made by them (KD = Kirill Datsenko), but by Peter Cherepanov 5 years prior, but it is hard to tell from the paper like many things unfortunately.
So what they gave the word was an effective system to knock out genes. First one PCRs out a resistance cassette from pKD3, pKD5 or pKD13 and transforms it in a strain with pKD46 (carrying the λ Red genes). The resistant strain is cured of the temperature sensitive plasmid in order for it to be transformed with pCP20, which is otherwise somewhat incompatible. pCP20 then flips out the FRT flanked cassette leaving only a scar.

The first oddity is that there are two plasmids. And that they cannot be used together. However, it was an improvement over previous plasmids that did not work —the paper seems to have taken a lot of trial and error. The pCP20 has both cat (chloramphenicol) and bla (ampicillin) and the same origin (pSC101 ts variant), but the major reason is that the flippase is under the control of λ cI857 repressor, which should mean it is expression is temperature induced, but being leaky, it works fine at 30°C. However, I saw an online protocol that added IPTG: clearly someone gave up on trying to figure out the badly documented plasmid and guessed a lac promoter. This just shows how annoying the information is.

The second oddity is the primers. pKD3 and pKD4 use the same primer pair for amplification, while pKD13 uses one the same and one different. That would be okay, if it were not for the fact that in the KEIO paper (discussed below) the names change and the sequences are hidden from sight.
Priming site 4 specific for pKD13 became P1 in the KEIO paper:
ATTCCGGGGATCCGTCGACC
Priming site 1 became P2:
TGTAGGCTGGAGCTGCTTCG
While priming site 2 specific for pKD3 is:
CATATGAATATCCTCCTTA
A crazy hidden difference is that the respective Tm are 61.6°C, 60°C and… 42.5°C (hence why pKD4 was discarded in favour of pKD13 probably). These primers sequences are added to the 3' end of homology sequences. These are like very long PCR primers that are at least 40 bp long. It is impossible to get a working H+P oligo that is less than the 60 bp cutoff for a regular oligo in IDT yet I have heard many people try pointlessly —one of the major flaws is indeed that. A cool innovation that came later on was to have the last few bases protected by phosphothiolate bonds, which cost about $5 each, which makes the price stack.

A third oddity is the names of deletions, DE(lacZYA)514∷cat, sounds very accurate until it is revealed that the 514 is simply the name of the primers in their collection and not some arcane position or length. Also, why they could typeset λ and not ∆ is a bit of a mystery.

KEIO collection

The best demonstration of the power of the technology came 6 years later when the Mori group knocked out each and every E. coli gene. A monumentous task published in Molecular Systems Biology —a good journal, but not really for a ten person paper that took half a decade. The collections is present in most E. coli labs it is that useful.
The paper however also presents with some headache inducing facts. From the text it is said that H1 (homology region 1) binds upstream of the start codon and P1 is attached to amply the kan cassette. Yet in the (very useful) supplementary data the H2 for genes on the opposite strand finish in ATG and it cannot be said if the cassette was made with a H2+P1 or H2+P2 arrangement.
A curious fact is that Keio is not a backronym —it is the name of the university—, unlike ASKA (2005), which stands for "A complete Set of E. coli K-12 ORF Archive", which is a top contender of the most forced scientific backronym ever award.
Lastly, the Keio collection lacks a few genes that aren't really essential essential as the cells were grown on LB regardless of the gene. As a result there are some peculiarities such as no rib genes as riboflavin (from the yeast extract) breaks down when autoclaved.

Recent innovations

Since then some fancy tricks have appeared. Phosphothiolated oligos have been used to knock stuff out or change sequences without selection. Delitto perfetto (perfect murder) is a great sounding method for when a cassette is knocked in and then knocked out with an oligo —tetA is great for that as fusaric acid counter selects it. MAGE relies on multiple cycles of transformations of oligos to introduce a plethora of defined changes.
The dual plasmid system has nonsensically plagued researchers for over a decade and only recently a plasmid that combines both is available (pJIS8) with an arabinose inducible Red and a rhamnose inducible flippase. Also only recently FRT flanked alternative markers have appeared as pKD13 (neo against kanamycin) and pKD3 (cm against chloramphenicol) can only be used once and it seems a shame having to switch plasmids to flip out the genes solely to return back to pDK46 (pSIJ196: spectinomycin, pSIJ197: gentamycin, pTKIP-hph: hygromycin, pTKIP-dhfr: trimethoprim, pFRT-Tet: tetracycline).
A really interesting innovation is the use of I-Sce (which causes confusion when pronounced). This enzyme overcomes the limitation of the size of the insert by allowing a cassette to be recombined with a landing pad plasmid. Another alternative for large size insertion is clonetegration, assuming the location of the insertion is not important (as it uses att sites).

Failures

Nevertheless, I know of a lot of failures, my own and of colleagues. So I would like to finish with these golden rules that I just made up:
  1. don't be cheap: too short homology regions never work (copy pasting from the supplementary info from KEIO paper is however always good)
  2. don't try and be too fancy: if your construct has homology elsewhere it will recombine there (if you must do whatever fanciness you plan, use a homologue from a different species)
  3. don't use λ red to knock in anything as big as 5 kb (consider using I-Sce, cf. paper).
  4. don't make a sick strain without markers: it will be contaminated by other lab strains regardless of how good your aseptic technique is —I mean it.
  5. don't try a un-selected mutation that results in a drop in fitness: you have to have a selection marker —no ifs or buts.
  6. don't try knocking out an essential gene: try to knock it out under conditions where it is not essential —there surely is a transporter out there.
  7. make a nice clean prep of template DNA, 200 ng at least —200 µl PCR reaction purified with Qiagen MinElute.
  8. fresh arabinose and a lot of it. The freshness is a mystery, but I have had failed knockouts work with brand new arabinose. Sigma does not say to store arabinose in the fridge but other suppliers do. There are two different phylosophies, 0.02% arabinose for 3 hours or 2% for 45 minutes. I like the latter. These cells need to grow in LB, not a medium with excess glucose (e.g. SOC) due to CRP.

Appendix


Here is an better annotated version of the cassette:
LOCUS       kan_cassette         1303 bp    DNA     linear SYN 11-SEP-2001
DEFINITION  Kan cassette, complete sequence.
ACCESSION   AY048744
VERSION     AY048744.1  GI:15554335
KEYWORDS    .
SOURCE      Template plasmid pKD13
  ORGANISM  Synthetic
            other sequences; artificial sequences; vectors.
REFERENCE   1  (bases 1 to 3434)
  AUTHORS   Datsenko,K.A. and Wanner,B.L.
  TITLE     One-step inactivation of chromosomal genes in Escherichia coli K-12
            using PCR products
  JOURNAL   Proc. Natl. Acad. Sci. U.S.A. 97 (12), 6640-6645 (2000)
   PUBMED   10829079
REFERENCE   2  (bases 1 to 3434)
  AUTHORS   Datsenko,K.A. and Wanner,B.L.
  TITLE     Direct Submission
  JOURNAL   Submitted (29-JUL-2001) Biological Sciences, Purdue University,
            Lilly Hall of Life Sciences, West Lafayette, IN 47907, USA
FEATURES             Location/Qualifiers
     primer_bind     1..20
                     /label="P1 KEIO"
     misc_feature    complement(1)
                     /note="priming site 4"
     source          complement(1..1303)
                     /organism="Template plasmid pKD13"
                     /mol_type="genomic DNA"
                     /db_xref="taxon:170493"
     misc_feature    28..62
                     /note="distal 35-nt of natural FRT site"
     Region          29..62
                     /label="FRT site"
     misc_feature    29..62
                     /feature_type=Misc.
                     /label=FRT
     protein_bind    complement(29..40)
                     /label="FRT site"
     promoter        288..336
                     /label="Kan promoter"
     RBS             417..422
                     /label="Kan RBS"
     CDS             430..1224
                     /note="kanamycin resistance"
                     /codon_start=1
                     /transl_table=11
                     /product="Tn5 neomycin phosphotransferase"
                     /protein_id="AAL02037.1"
                     /db_xref="GI:15554336"
                     /translation="MIEQDGLHAGSPAAWVERLFGYDWAQQTIGCSDAAVFRLSAQGR
                     PVLFVKTDLSGALNELQDEAARLSWLATTGVPCAAVLDVVTEAGRDWLLLGEVPGQDL
                     LSSHLAPAEKVSIMADAMRRLHTLDPATCPFDHQAKHRIERARTRMEAGLVDQDDLDE
                     EHQGLAPAELFARLKARMPDGEDLVVTHGDACLPNIMVENGRFSGFIDCGRLGVADRY
                     QDIALATRDIAEELGGEWADRFLVLYGIAAPDSQRIAFYRLLDEFF"
     primer_bind     complement(481..500)
                     /label="k1 primer"
     misc_feature    complement(481..500)
                     /note="common priming site k1"
     misc_feature    complement(572..591)
                     /note="common priming site k2"
     primer_bind     591..610
                     /label="k2 primer"
     misc_feature    complement(1042..1061)
                     /note="common priming site kt"
     protein_bind    1213..1225
                     /label="FRT site"
     misc_feature    1237..1284
                     /note="natural FRT site"
     Region          1251..1284
                     /label="FRT site"
     misc_feature    1251..1284
                     /feature_type=Misc.
                     /label=FRT
     primer_bind     complement(1284..1303)
                     /label="P2 KEIO"
     misc_feature    complement(1285..1303)
                     /note="priming site 1"
ORIGIN
        1 ATTCCGGGGA TCCGTCGACC TGCAGTTCGA AGTTCCTATT CTCTAGAAAG TATAGGAACT
       61 TCAGAGCGCT TTTGAAGCTC ACGCTGCCGC AAGCACTCAG GGCGCAAGGG CTGCTAAAGG
      121 AAGCGGAACA CGTAGAAAGC CAGTCCGCAG AAACGGTGCT GACCCCGGAT GAATGTCAGC
      181 TACTGGGCTA TCTGGACAAG GGAAAACGCA AGCGCAAAGA GAAAGCAGGT AGCTTGCAGT
      241 GGGCTTACAT GGCGATAGCT AGACTGGGCG GTTTTATGGA CAGCAAGCGA ACCGGAATTG
      301 CCAGCTGGGG CGCCCTCTGG TAAGGTTGGG AAGCCCTGCA AAGTAAACTG GATGGCTTTC
      361 TTGCCGCCAA GGATCTGATG GCGCAGGGGA TCAAGATCTG ATCAAGAGAC AGGATGAGGA
      421 TCGTTTCGCA TGATTGAACA AGATGGATTG CACGCAGGTT CTCCGGCCGC TTGGGTGGAG
      481 AGGCTATTCG GCTATGACTG GGCACAACAG ACAATCGGCT GCTCTGATGC CGCCGTGTTC
      541 CGGCTGTCAG CGCAGGGGCG CCCGGTTCTT TTTGTCAAGA CCGACCTGTC CGGTGCCCTG
      601 AATGAACTGC AGGACGAGGC AGCGCGGCTA TCGTGGCTGG CCACGACGGG CGTTCCTTGC
      661 GCAGCTGTGC TCGACGTTGT CACTGAAGCG GGAAGGGACT GGCTGCTATT GGGCGAAGTG
      721 CCGGGGCAGG ATCTCCTGTC ATCTCACCTT GCTCCTGCCG AGAAAGTATC CATCATGGCT
      781 GATGCAATGC GGCGGCTGCA TACGCTTGAT CCGGCTACCT GCCCATTCGA CCACCAAGCG
      841 AAACATCGCA TCGAGCGAGC ACGTACTCGG ATGGAAGCCG GTCTTGTCGA TCAGGATGAT
      901 CTGGACGAAG AGCATCAGGG GCTCGCGCCA GCCGAACTGT TCGCCAGGCT CAAGGCGCGC
      961 ATGCCCGACG GCGAGGATCT CGTCGTGACC CATGGCGATG CCTGCTTGCC GAATATCATG
     1021 GTGGAAAATG GCCGCTTTTC TGGATTCATC GACTGTGGCC GGCTGGGTGT GGCGGACCGC
     1081 TATCAGGACA TAGCGTTGGC TACCCGTGAT ATTGCTGAAG AGCTTGGCGG CGAATGGGCT
     1141 GACCGCTTCC TCGTGCTTTA CGGTATCGCC GCTCCCGATT CGCAGCGCAT CGCCTTCTAT
     1201 CGCCTTCTTG ACGAGTTCTT CTAATAAGGG GATCTTGAAG TTCCTATTCC GAAGTTCCTA
     1261 TTCTCTAGAA AGTATAGGAA CTTCGAAGCA GCTCCAGCCT ACA
//

1 comment:

  1. Thank you for writing this very insightful guide! It helped my troubleshoot my problems. Keep up the good work!

    ReplyDelete