Team:KCL UK/Software

Software

Introduction:

Synthetic biology relies heavily upon the integration of mathematics and engineering principles to design new biological systems with novel functionalities. Alongside our wet lab studies, we have created two software tools which serve as an informative resource for users to not only investigate the size limitations of current viral vectors but also equip them with a tool to investigate how capsid architecture may be manipulated to accommodate for a candidate therapeutic gene.

Background: Biological Principles

What parameters can be used to determine the size of viral capsids?

Viruses vary widely in complexity and have evolved unique protein containers to encapsulate their genomes. These capsids are comprised of numerous copies of one or a few different types of structural proteins that exist in particular ratios [1]. The structural proteins are organised into subunit conformations know as triangular facets which assembled into spherical structures with icosahedral symmetry [1].

Between different viruses the number and arrangement of capsid proteins varies widely, however all icosahedral capsids are limited by geometric constraints. The capsid is constructed from the same type of subunit-subunit interactions; however, these interactions may differ by subtle non-symmetrical changes in their environments [2]. The ‘triangulation number’ is a value that represents the number of unique environments that the subunits may occupy and can be used as a quantitative metric for capsid size [3]. The triangulation number is proportional to the size of the viral capsid, whereby larger viruses have a larger number of subunits protein which in part is attributed to changes to the organisation(ratio) of structural proteins comprising the protein shell [3].

How have these concepts influenced the objectives of our software?

Viruses with icosahedral symmetry come in many sizes, but the size of capsid proteins remains relatively static, therefore we cannot increase the size of the capsid by increasing the size of these proteins. However, one way in which we can construct larger viral capsids is through the addition of more protein subunits. As a result, our team has set out to construct software tools designed to manipulate the ratios of these capsid proteins to increase packaging capacities of viral vectors.

Our Software:

Development of our software tools was achieved using the programming language, Python. Data for capsid dimensions was retrieved from the VIPERdb database and interactive web-based subplots were designed using the plotly Python library.

The results are output to the user as a combination of detailed tables printed within the shell of the user’s IDE, and also through web-based subplots demonstrating highlighted components of data.

The GitHub for our software tools can be found here.

CapsidOptimiser:

CapsidBuilder:

CapsidOptimiser: Designing Viral Capsids

Figure 1: Schematic showing the overall process of using CapsidOptimser to design viral capsids with optimised packaging capacities.

The aim of our initial software tool was to design a program that addresses the limiting packaging capacities of viral vectors. The four main families of viruses used for gene delivery include, Adenoviridae, Herpesviridae, Parvoviridae and Retroviridae. It has been observed for these capsids, that there is discontinuity between their packaging capacities and overall capsid size. Therefore, to determine the most appropriate viral capsid for the delivery of a candidate gene our selection process includes calculations for, internal and external spherical and icosahedral capsid volumes alongside cylindrical volumes related to packaging capacities. In this way, we were able to not only determine which viral capsid would be a suitable candidate but also how their pre-existing geometries may be altered.

Part One:

1) Candidate gene selection and determination of capsid availability:

One of the main motivations driving the work of the Capacity project has been to encourage the development of technologies that may effectively deliver therapeutic genes to patients affected by rare genetic diseases. Keeping this in mind, our model includes calculations for a total of 57 candidate genes associated with rare genetic diseases.

Firstly, the user will be able to choose a gene from a list of rare genetic diseases. The volume of the gene is calculated using equations for cylindrical volumes which vary by double stranded and single stranded forms of nucleic acid. Genomic volumes were calculated assuming that double-stranded DNA molecules are cylindrical with an ~20 Å diameter and a distance of ~3.4 Å between adjacent nucleotides in the backbone.

Equation 1: Calculating the volume of the double stranded genome.
V=(3.4×L)×π× 20 2 2 1,068L

*where L is the genome length (nt)

Equation 2: Calculating the volume of the single stranded genome.
V=534L
Equation 3: Calculating the spherical volume of a capsid
V= 4 3 ×π× r 3

Viral capsids are then divided into two groups based on those packaging either double stranded or single stranded nucleic acid. The genome volume of the selected gene is then compared to each appropriate volume. The user will receive a list of the number of available viral capsids that can carry the gene based on the capsid inner spherical volume.

Equations and data for capsid dimensions taken obtained from Brandes and Linial, 2016.

2) Compare the list of available capsids with approved viral capsid that have been used for therapeutics:

Of the available viral capsids, the program then determines which have been approved for therapeutics (Adenoviridae, Herpesviridae, Parvoviridae, and Retroviridae). The user will receive a list of the approved viral vectors with a larger capsid volume. Of these approved viral capsids, the program will find sort through them to find either the smallest capsid or - if no capsids are available - it will calculated a readjustment value. Readjustment values are calculated from the difference between the spherical capsid volume of the gene and the spherical volume of the remaining capsids.

3) Determine capsids with a sufficient packaging capacity:

The packaging capacity of each viral capsid is different from the inner spherical volume, whereby between the 4 approved viral capsids the packaging volume is smaller than the overall volume of the capsid. Packaging volumes are calculated using Equation 1 and 2, where the Adenoviridae and Herpesviridae package double stranded nucleic acid, versus Parvoviridae and Retroviridae, which package single stranded nucleic acid. The genome volume of the selected gene is compared to the packaging volumes of each viral capsids.

Table 1: Packaging capsids of the viral capsids approved for therapy.*

Viral Family: Packaging Capacity: Packaging Volume:
Adenoviridae 7.5 478829133
Herpesviridae >30 1121104558
Parvoviridae 4.5 27816758
Retroviridae 8 7119751

*Data obtained from Lundstrom, 2018.

Selection of the most appropriate viral capsid is based on only the packaging volume, as this is the only measurement clinically relevant. Selection depends on;

  1. If there are available viral capsids with sufficient packaging capacity, the selected capsid will be the virus with the smallest possible packaging capacity;
  2. If there are no viruses with a sufficient packaging capacity, selection is based on the capsid that requires the smallest volume adjustments. This ensures there is not an excess amount of available space within the capsid.

Part Two: Determining the new triangulation number and new required amount of protein.

From viral capsid selection, a new triangulation number and new amount of protein may be calculated.

1) Calculating the new T value:

To determine the new triangulation number, the icosahedral volume of the selected gene is calculated. Where calculation of icosahedral volume is required since calculations for the new T value is determined from the dimensions of a capsid’s triangular face. Therefore, a spherical volume cannot be used.

Calculations for icosahedral volumes are achieved by firstly determining what type of genome the selected capsid packages (either double stranded or single stranded). The appropriate single or double stranded cylindrical genome volume is then converted to a spherical volume (spherical volume is equal to 2/3 of the cylindrical volume). This spherical volume is then converted to an icosahedral volume, using the conversion value: 0.98*

*This value was acquired by dividing the icosahedral volumes of the approved viral capsid by their spherical volumes (using the inner capsid radius). Conversion values between the icosahedral volumes using the outer radius was achieved in the same way. Overall, determination of the icosahedral volume using the outer capsid radius provides the dimensions for the whole overall capsid structure.

Once icosahedral volumes of the gene are acquired, the area of a single triangular face is calculated using two equations:

Equation 4: Calculating the volume of an icosahedron
V= 5 12 × a 3 × 3+ 5

Where V is the icosahedral volume, and a is the side length of a single triangular face. Equation 4 was rearranged to determine the side length of the triangular face, used in Equation 5.

Equation 5: Determining the mid-sphere radius:
R m = a 4 × 1+ 5

Where Rm is the mid-sphere radius

Once the radius is determined the area of the of a single triangular face, A∆, is achieved by finding the total capsid area, then dividing this by 20 (all icosahedrons have 20 identical triangular faces.)

Equation 6: Calculating the area of the triangular face
A Δ 5× a 2 × 3 20

These calculations were repeated for the approved viral capsid selected for the delivery of the gene. Having acquired both the area of the theoretical viral capsid (input selected gene) and the area of the triangular face for the chosen viral capsid. An enlargement value is determined by dividing the theoretical triangular face area by the pre-existing capsid area.

A new T value is then determined by multiplying the enlargement value by the original T value of the selected viral capsid.

2) Determining the new amount of required protein:

The new number of subunits proteins will be required to construct a larger viral capsid where the size of the capsid can be described by the triangulation number. The total number of subunit proteins within a viral capsid can be found by multiplying the triangulation number by 60, as the number of viral capsid subunits increases by multiples of 60.

Part Three: Determining the stability of the capsid

Capsid stability, Cs, is calculated from the summation of association energies, EA, at each unique interface of protein subunits and normalized by its respective diameter, d. This parameter was to determine the extent to which capsid stability changes with newly constructed viral capsid. Where a capsid stability value that is more negative is more stable. In this way, we have a comparable parameter between different viral capsids, which allows us to consider the biologically feasible associated with the new icosahedral architecture of the viral capsid.

Equation 7: Calculating the viral capsid stability
C s = E A d
Viral Family Outer Diameter (A) Sum Association Energies (kJ) Capsid Stability kJ/A
Adenoviridae 976 -415.7 -0.425922131
hhhh hhhh hhhhh hhhh
Parvoviridae 378 -383.1125 -1.013525132
Retroviridae 116 -3.2 -0.027586207

*Values for association energies were acquired from VIPERdb database, data for the Herpesviridae virus was not available as there were no unique interfaces between subunits.

Part Four: plotly Viral Capsid Optimisation Subplot Results Page

Figure 2: Total subunit protein levels.
Figure 3: Capsid size comparison model.

CapsidBuilder: New Protein Ratios and Usability of Wet Lab Constructs

Alongside calculating new capsid architectures, it was also important to address how these structures may actually be constructed. Our second software tool is designed to generate new ratios of subunit proteins related to the new number of capsid proteins required for the construction of the novel capsid. Alongside this, feasibility of capsid construction is evaluating by determining the usability of a suitable wet lab construct through translation efficiency measurements.

*This will only conduct calculations for protein ratios and translation efficiencies based on the adeno-associated viral capsid. This constraint to the model is based on limited amount of data published for the remaining three viral vectors.

The Adeno-associated virus (Parvoviridae Viral Family) has a triangulation number of 1. It's total number of protein subunits is equal to 60. The capsid is made up of three structural proteins; VP1, VP2, and VP3. These exist in a ratio of 1:1:10 [6]. Readjustments of this capsid to package a candidate gene is achieved through alteration of this protein ratio.

Part 1: Determining novel protein rations

The derivation of new subunit protein ratios for the adeno-associated virus was achieved through a series of steps that utilise previous calculations from our first capsid remodelling program. Firstly, the user is able to select one candidate gene from the list of 57 rare genetic diseases. The new amount of capsid subunit proteins is used to then used to generate a random sets of newly derived protein ratios through the following steps;

1) Generate lists of new potential protein ratios.

  1. Determine the number of VP1, VP2 and VP3 proteins that occupy a single triangular face (achieved by dividing the total subunit count for the entire capsid by 20).
  2. Create a list of 15 new protein ratios, the program will generate random values for each of three subunit proteins that sum to the total protein count in the single triangular face.
  3. Removal of ratios that include values of 0 for the any of the subunit proteins.
  4. Determine the greatest common denominator between the three randomly generated values for VP1, VP2, and VP3 protein in each ratio.
  5. Simply the ratio by their GCD and present output to user.

2)Evaluate the suitability of wet lab constructs using translation efficiency measurements.

From the newly generated list of protein ratios, calculation that involve the integration of wet lab sequences are used to determine translation efficiencies for each subunit unit protein. These calculations are achieved using mRNA folding dynamics and ribosome-binding dynamics which is described in more detail in our modelling section. These values are then evaluated against pre-existing baseline efficiencies to determine which of the constructs is most suitable for expression of each subunit protein.

3) Selection of suitable wet lab constructs.

  1. New values of each subunit proteins are initially derived from the code related to protein ratio algorithm. These values are used to determine an enlargement factor by dividing the new amount of each subunit protein by the original amount.(VP1 = 5, VP2 = 5, VP3 = 50).
  2. From the lists of protein ratios, each of the three subunit proteins are grouped separately
  3. Calculations of translation efficiencies for each subunit protein using the 5 constructs is determined, where values for T_(mRNA) are based on the newly required protein quantities of each subunit.
  4. Each of the subunit values are regrouped back into their original protein ratios and a summary table outlining the selected construct and translation efficiencies is displayed.
Figure 4: plotly Translation efficiency subplot results page; including summary table of selected constructs and a graph of subunit translation efficiencies.

References

  1. Hagan, M. F. (2014) Modeling Viral Capsid Assembly. Advances in Chemical Physics Advances in Chemical Physics: Volume 155.1–68.
  2. Twarock, R. & Luque, A. (2019) Structural puzzles in virology solved with an overarching icosahedral design principle. Nature Communications. 10 (1).
  3. Mannige, R. V. & Brooks, C. L. (2010) Periodic Table of Virus Capsids: Implications for Natural Selection and Design. PLoS ONE. 5 (3), .
  4. Brandes, N. & Linial, M. (2016) Gene overlapping and size constraints in the viral world. Biology Direct. 11 (1).
  5. Lundstrom, K. (2018) Viral Vectors in Gene Therapy. Diseases. 6 (2), 42.
  6. Bosma, B. et al. (2018) Optimization of viral protein ratios for production of rAAV serotype 5 in the baculovirus system. Gene Therapy. [Online] 25 (6), 415–424.
  7. Na, D. et al. (2010) Mathematical modeling of translation initiation for the estimation of its efficiency to computationally design mRNA sequences with desired expression levels in prokaryotes. BMC Systems Biology. 4 (1), 71.