The Serotypes Command
The serotypes command helps Addgene’s Research team determine the serotype of a viral vector prep using Next Generation Sequencing (NGS) of the prep. Addgene has identified unique sequences for commonly used capsids. The command reads all FASTQ files from an input folder, extracts the reads and counts the occurrences of each sequence. A small amount of RepCap plasmid is packaged within the AAV, so the capsid used for production will be represented in the FASTQ data. For example, an AAV2 vector should return counts for the AAV2 sequence, but not for the AAV5 or other sequences.
Notes:
- Occasionally, you will have spurious matches for other serotypes, but one sequence should be the clear majority.
- Only a small amount of RepCap plasmid is packaged, so if you don’t have any matches it may be that you need a higher number of NGS reads.
- If you are using a different capsid sequence (for instance, you may have the same amino acid sequence, but have a different DNA sequence than the one Addgene uses in our RepCap plasmids), you will need to adjust the capsid sequences that the program is searching for such that your capsid sequence matches the one in the program. See the Configuration section below for how to add your own signatures.
- If you include the name of the serotype you expect in the FASTQ file name, the command will report if the top match corresponds to this expectation.
Configuration
The command’s parameters can be modified by editing the file parameters.yml
(using your favorite text editor) To change the parameter values, locate the section named serotypes
and follow the examples in the file. The parameters are:
- input_folder: the folder where the FASTQ files are (follow the examples in the file).
- output_folder: the folder for the output CSV files (follow the examples in the file).
- signatures: name and sequences of the signatures that the command will look for. This parameter is pre-populated with the signatures that Addgene has identified for commonly used capsids. Note that it is possible to specify more than one sequence to match for a given capsid. You may add other sequences if you need to, just follow the example syntax.
Procedure
- Adjust the parameters for the script by editing the file
parameters.yaml
as described above. - In a terminal window, navigate to the toolkit folder (if you’re using our Docker container, run a shell in the
container first):
cd openbio-master/toolkit
- Issue the following command:
python atk.py serotypes
- Once the command finishes, you will find the output CSV files in the folder you selected. Two files will be generated: a full report and a summary. The file names will contain the date when the report was generated.