e1 <- system.file("extdata", "e1_hepc_sorted.aln", package = "evo3D")
e2 <- system.file("extdata", "e2_hepc_sorted.aln", package = "evo3D")
pdb <- system.file("extdata", "e1e2_8fsj.pdb", package = "evo3D")Tutorial 4: Results Object
In this tutorial we will explore the run_evo3d() results output object in detail. We will reuse the Hepatitis C Virus E1/E2 example
library(evo3D)
# add some stats #
sc <- list(calc_tajima = TRUE, calc_site_entropy = TRUE)
# adjust some patch parameters #
pc <- list(max_patch = 15, dist_cutoff = NA, distance_method = 'centroid')
results <- run_evo3d(list(e1, e2), pdb, pdb_controls = pc, stat_controls = sc,
detail_level = 2, # retain spatial haplotypes #
verbose = 0
)using BLOSUM65
using BLOSUM65
Registered S3 method overwritten by 'pegas':
method from
print.amova ade4
We have covered evo3d_df in detail in tutorial 1.
str(results, max.level = 1)List of 6
$ evo3d_df :'data.frame': 618 obs. of 18 variables:
$ final_msa_subsets:List of 256
$ msa_info_sets :List of 2
$ pdb_info_sets :List of 1
$ aln_info_sets :List of 1
$ call_info :List of 13
- attr(*, "class")= chr "evo3D_results"
final_msa_subsets
Now final_msa_subsets is also populated as we set detail_level to 2. Individual spatial haploytpes (structure-informed MSA subsets) can be accessed through msa_subset_id column.
id = results$evo3d_df$msa_subset_id[1]
results$final_msa_subsets[[id]][1:6, 1:6] [,1] [,2] [,3] [,4] [,5] [,6]
EU482883.1 "T" "A" "T" "G" "A" "A"
EU155335.2 "T" "A" "C" "G" "A" "A"
EU155262.2 "T" "A" "T" "G" "A" "A"
EU155357.2 "T" "A" "C" "G" "A" "A"
EU256054.1 "T" "A" "C" "G" "A" "A"
EU256077.1 "T" "A" "T" "G" "A" "A"
msa_info_sets
Each input MSA will have information stored. The first MSA passed is always called msa1, followed by msa2 and so on. msa_mat is the original input data. ref has been constructed as a consensus sequence by defualt. pep is translated reference sequence. seq_type is auto detected nucleotide or protein sequence.
str(results$msa_info_sets, max.level = 2)List of 2
$ msa1:List of 4
..$ msa_mat : chr [1:271, 1:576] "T" "T" "T" "T" ...
.. ..- attr(*, "dimnames")=List of 2
..$ ref : Named chr "TATGAAGTGCGCAACGTGTCCGGGGTGTACCATGTCACGAACGACTGCTCCAACTCAAGCATTGTGTATGAGGCAGCGGACATGATCATGCACACCCCCGGGTGCGTGCCC"| __truncated__
.. ..- attr(*, "names")= chr "ref.consensus"
..$ pep : Named chr "YEVRNVSGVYHVTNDCSNSSIVYEAADMIMHTPGCVPCVRENNSSRCWVALTPTLAARNASVPTTTIRRHVDLLVGAAAFCSAMYVGDLCGSVFLVSQLFTFSPRRHETVQ"| __truncated__
.. ..- attr(*, "names")= chr "ref.consensus"
..$ seq_type: chr "nucleotide"
..- attr(*, "class")= chr "evo3D_msa_info"
$ msa2:List of 4
..$ msa_mat : chr [1:271, 1:1278] "A" "A" "A" "A" ...
.. ..- attr(*, "dimnames")=List of 2
..$ ref : Named chr "AACACCCACGTGACAGGGGGGGCGGCAGCCCGCACCACCCGCGGGTTCACGTCCCTCTTTACACCTGGGCCGTCTCAGAAAATCCAGCTTATAAACACCAACGGCAGCTGG"| __truncated__
.. ..- attr(*, "names")= chr "ref.consensus"
..$ pep : Named chr "NTHVTGGAAARTTRGFTSLFTPGPSQKIQLINTNGSWHINRTALNCNDSLHTGFLAALFYTHKFNSSGCPERMASCRPIDKFAQGWGPITYAEPHSSDQRPYCWHYAPRPC"| __truncated__
.. ..- attr(*, "names")= chr "ref.consensus"
..$ seq_type: chr "nucleotide"
..- attr(*, "class")= chr "evo3D_msa_info"
pdb_info_sets
Each input PDB similar to MSA will have an entry in pdb_info_sets. pdb is the input object, chain indicates the chains in this PDB that were used for downstream MSA↔︎PDB mappings. residue_dist is a distance matrix between residues (default between alpha carbons). residue_df is a precursor to evo3d_df but contains spatial windows before any mapping to codons. It is worth exploring in a bit more depth.
str(results$pdb_info_sets, max.level = 2)List of 1
$ pdb1:List of 5
..$ pdb :List of 8
.. ..- attr(*, "class")= chr [1:2] "pdb" "sse"
..$ chain : chr [1:2] "A" "E"
..$ seq_set : Named chr [1:2] "YEVRNASGLYHVTNDCSNASIVYETTDMIMHTPGCVPCVREDNSSRCWVALTPTLAARNASVPTPRRHETVQDCNCSIYPGH" "HINRTALNCNDSLHTGFLAALFYTHKFNASGCPERMAHCRPIDEFAQGWGPITYAEGHGSDQRPYCWHYAPRQCGTIPASQVCGPVYCFTPSPVVVGTTDRFGAPTYTWGE"| __truncated__
.. ..- attr(*, "names")= chr [1:2] "A" "E"
..$ residue_dist: num [1:366, 1:366] 0 5.64 7.98 11.64 7.72 ...
.. ..- attr(*, "dimnames")=List of 2
..$ residue_df :'data.frame': 366 obs. of 12 variables:
..- attr(*, "class")= chr "evo3D_pdb_info"
head(results$pdb_info_sets$pdb1$residue_df) residue_index aa orig_resno orig_chain orig_insert sasa rsa residue_id
1 1 H 421 E 188.480 0.93 421_E_
2 2 I 422 E 96.399 0.53 422_E_
3 3 N 423 E 85.474 0.52 423_E_
4 4 R 424 E 67.327 0.26 424_E_
5 5 T 425 E 33.799 0.22 425_E_
6 6 A 426 E 5.132 0.04 426_E_
exposed
1 TRUE
2 TRUE
3 TRUE
4 TRUE
5 TRUE
6 FALSE
patch
1 421_E_+422_E_+425_E_+529_E_+423_E_+442_E_+441_E_+427_E_+505_E_+424_E_+613_E_+438_E_+440_E_+527_E_+528_E_
2 422_E_+423_E_+425_E_+421_E_+424_E_+529_E_+527_E_+517_E_+505_E_+427_E_+528_E_+515_E_+516_E_+441_E_+525_E_
3 423_E_+422_E_+527_E_+529_E_+424_E_+425_E_+528_E_+421_E_+517_E_+525_E_+427_E_+531_E_+505_E_+532_E_+516_E_
4 424_E_+517_E_+527_E_+425_E_+423_E_+422_E_+525_E_+516_E_+528_E_+515_E_+529_E_+524_E_+520_E_+523_E_+505_E_
5 425_E_+422_E_+529_E_+424_E_+505_E_+427_E_+423_E_+517_E_+421_E_+528_E_+516_E_+428_E_+515_E_+527_E_+531_E_
6 <NA>
patch_len max_dist
1 15 13.34
2 15 12.61
3 15 13.54
4 15 10.50
5 15 10.39
6 NA NA
In residue_df we see spatial windows (patch) as residue indexes before being mapped to codons. Additionally, solvent accessibility information is stored in the sasa (absolute accessibility) and rsa (relative accessibility) columns.
aln_info_sets
aln_info_sets are stored per PDB entry, gathering all MSA that map to this PDB. It sits between the purely geometric windows of pdb_info_sets and the codon indexed (and possible codon collapsed) windows of evo3d_df. coverage was originally planned for quick coverage plot, but was not implemented as such. aln_df is an intermediatate between residue_df and the final evo3d_df. When homomultimers are in use or multiple PDB files input, aln_df stores the original residue anchored but codon indexed spatial windows before these windows are merge into one window per codon. pos_mat is likely the most important object here, as it is adjustable and allows run_evo3d() to restart upon adjustment.
str(results$aln_info_sets, max.level = 2)List of 1
$ pdb1:List of 3
..$ coverage:List of 2
..$ aln_df :'data.frame': 618 obs. of 16 variables:
..$ pos_mat :List of 2
..- attr(*, "class")= chr "evo3D_aln_info"
Examining alignment coverage
results$aln_info_sets$pdb1$coverage$msa1_pdb1_A
$msa1_pdb1_A$ref_aa
[1] "1:192"
$msa1_pdb1_A$pdb_aa
[1] "1:64" "104:121"
$msa1_pdb1_A$mismatch
[1] 6 9 19 25 26 42
$msa2_pdb1_E
$msa2_pdb1_E$ref_aa
[1] "1:426"
$msa2_pdb1_E$pdb_aa
[1] "38:321"
$msa2_pdb1_E$mismatch
[1] 66 75 81 94 96 110 113 114 141 145 155 225 272
Each MSA to PDB mapping will have a pos_mat entry. Here showing only the first. It is highly nested, but it seemed optimal to adopt one structure that remained consistent and was easily extractable and inspectable by users.
results$aln_info_sets$pdb1$pos_mat$msa1_pdb1_A[1:5,] ref_aa pdb_aa msa pdb codon residue_id
[1,] "Y" "Y" "msa1" "pdb1" "1" "192_A_"
[2,] "E" "E" "msa1" "pdb1" "2" "193_A_"
[3,] "V" "V" "msa1" "pdb1" "3" "194_A_"
[4,] "R" "R" "msa1" "pdb1" "4" "195_A_"
[5,] "N" "N" "msa1" "pdb1" "5" "196_A_"
call_info
The last element is call_info which stores the parameters of the workflow.
str(results$call_info)List of 13
$ msa :List of 2
..$ msa1: chr "C:/Users/bradk/AppData/Local/R/win-library/4.4/evo3D/extdata/e1_hepc_sorted.aln"
..$ msa2: chr "C:/Users/bradk/AppData/Local/R/win-library/4.4/evo3D/extdata/e2_hepc_sorted.aln"
$ pdb :List of 1
..$ pdb1: chr "C:/Users/bradk/AppData/Local/R/win-library/4.4/evo3D/extdata/e1e2_8fsj.pdb"
$ run_grid :'data.frame': 2 obs. of 4 variables:
..$ msa : chr [1:2] "msa1" "msa2"
..$ pdb : chr [1:2] "pdb1" "pdb1"
..$ chain : chr [1:2] "A" "E"
..$ kmer_match: num [1:2] 0.709 0.836
..- attr(*, "out.attrs")=List of 2
.. ..$ dim : Named int [1:2] 2 1
.. .. ..- attr(*, "names")= chr [1:2] "msa" "pdb"
.. ..$ dimnames:List of 2
.. .. ..$ msa: chr [1:2] "msa=msa1" "msa=msa2"
.. .. ..$ pdb: chr "pdb=pdb1"
$ interface_chain :List of 1
..$ pdb1: logi NA
$ occlusion_chain :List of 1
..$ pdb1: logi NA
$ msa_controls :List of 5
..$ ref_method : chr "consensus"
..$ force_seq_type : NULL
..$ detect_sequence_threshold: num 0.8
..$ detect_sequence_len : num 100
..$ genetic_code : num 1
$ pdb_controls :List of 12
..$ distance_method : chr "centroid"
..$ drop_incomplete_residue: logi TRUE
..$ rsa_method : chr "rose"
..$ dist_cutoff : logi NA
..$ rsa_cutoff : num 0.1
..$ sasa_cutoff : logi NA
..$ use_rsa_sasa : chr "or"
..$ only_exposed_in_patch : logi TRUE
..$ max_patch : num 15
..$ interface_dist_cutoff : num 5
..$ force_file_type : NULL
..$ patch_mode : chr "codon"
$ aln_controls :List of 3
..$ use_sample_names : logi TRUE
..$ auto_chain_threshold: num 0.2
..$ kmer_size : num 4
$ stat_controls :List of 7
..$ calc_pi : logi FALSE
..$ calc_tajima : logi TRUE
..$ calc_hap : logi FALSE
..$ calc_site_entropy : logi TRUE
..$ calc_avg_patch_entropy: logi FALSE
..$ calc_block_entropy : logi FALSE
..$ valid_aa_only : logi TRUE
$ output_controls :List of 6
..$ output_dir : NULL
..$ write_msa_subsets : logi TRUE
..$ write_evo3d_df : logi TRUE
..$ write_call_info : logi TRUE
..$ write_module_intermediates: logi FALSE
..$ prefix : chr ""
$ collapse_controls:List of 2
..$ merge_type : chr "exposure_distance"
..$ merge_exposure: num 0.5
$ verbose : num 0
$ detail_level : num 2
Next tutorials:
- Custom statistics