Basic tutorial on Quantitative Fate Mapping (QFM)

Read example data

In this example, we demonstrate quantitative fate mapping on lineage barcode data measured with cell types. First, we apply Phylotime to reconstruct time-scaled phylogeny from lineage barcodes. Second, ICE-FASE is applied to extract parameters of progenitor population dynamics, including commitment times, progenitor population sizes and commitment biases.

We start with a character matrix, the example data is in the format below. An cell id is required, and is provided in the first column.

# read mutant allele matrix
cell_allele_table = readr::read_csv("../data/example/mutant_allele_matrix_with_missing.csv")
cell_allele_table[1:5]

## # A tibble: 1,600 × 5
##    cell             Site_1      Site_2      Site_3      Site_4     
##    <chr>            <chr>       <chr>       <chr>       <chr>      
##  1 type_-1_gen_4_1  Site1_Mut3  Site2_Mut28 Site3_Mut1  Site4_Mut50
##  2 type_-1_gen_4_2  Site1_Mut3  Site2_Mut28 Site3_Mut19 NA         
##  3 type_-1_gen_4_3  Site1_Mut3  Site2_Mut28 Site3_Mut1  Site4_Mut1 
##  4 type_-1_gen_4_4  Site1_Mut15 Site2_Mut3  Site3_Mut12 NA         
##  5 type_-1_gen_4_5  Site1_Mut15 Site2_Mut3  Site3_Mut12 Site4_Mut4 
##  6 type_-1_gen_4_6  Site1_Mut15 Site2_Mut14 Site3_Mut2  Site4_Mut7 
##  7 type_-1_gen_4_7  Site1_Mut3  Site2_Mut28 NA          Site4_Mut33
##  8 type_-1_gen_4_8  Site1_Mut3  Site2_Mut28 NA          Site4_Mut1 
##  9 type_-1_gen_4_9  Site1_Mut3  Site2_Mut28 Site3_Mut1  Site4_Mut8 
## 10 type_-1_gen_4_10 Site1_Mut3  Site2_Mut28 Site3_Mut9  Site4_Mut52
## # ℹ 1,590 more rows

First, we convert the table to a character matrix

chr_mat = as.matrix(cell_allele_table[-1])
rownames(chr_mat) = cell_allele_table$cell

Imputing missing data

Generally, mutation matrix contains missing data. We will use impute_characeters to impute the missing data.

chr_mat_imputed = impute_characters(chr_mat, nrounds = 100)

Running Phylotime

The input to Phylotime is a character matrix, with rows corresponding to cells and columns corresponding to barcoding sites, tibble / data.frame are convert to a character matrix. The matrix needs to have rownames, which will be used as the names of the cells.

Within each barcoding site (column), the same character denotes a unique mutant allele. The unmutated allele needs to be coded as the character “0”. (zero) The user also needs to input the total amount of active barcoding time, which is the duration for which mutations are being actively generated.

Phylotime estimates mutation rate from data provided and assumes a uniform allele emergence probability by default. More on how to set user-defined mutation parameters when prior information is available.

By default, Phylotime uses future to parallel pairwise distance estimations, use plan to set up parallel method and number of workers. Alternatively, set ‘parallel=F’.

In this example, barcoding activates at 0.6 days, and samples are collected at 15 days.

library(furrr)
plan(multisession, workers = 8)
tr = phylotime(chr_mat_imputed, t_total = 15. - 0.6)

If mutation rate or allele emergence probabilities are known a priori, they can be provided to phylotime

mut_rate_tb = read_csv("../data/example/mut_p_mut_rate.csv")
mut_rate_tb

## # A tibble: 50 × 2
##    site     rate
##    <chr>   <dbl>
##  1 site_1  0.937
##  2 site_2  0.430
##  3 site_3  0.641
##  4 site_4  0.430
##  5 site_5  0.937
##  6 site_6  0.430
##  7 site_7  0.394
##  8 site_8  0.269
##  9 site_9  0.936
## 10 site_10 0.534
## # ℹ 40 more rows

allele_prob_tb = read_csv("../data/example/mut_p_recur_vec.csv")
allele_prob_tb

## # A tibble: 137,548 × 3
##    site   mutation      prob
##    <chr>  <chr>        <dbl>
##  1 Site_1 Site1_Mut1  0.0756
##  2 Site_1 Site1_Mut2  0.0621
##  3 Site_1 Site1_Mut3  0.0608
##  4 Site_1 Site1_Mut4  0.0483
##  5 Site_1 Site1_Mut5  0.0455
##  6 Site_1 Site1_Mut6  0.0402
##  7 Site_1 Site1_Mut7  0.0390
##  8 Site_1 Site1_Mut8  0.0387
##  9 Site_1 Site1_Mut9  0.0375
## 10 Site_1 Site1_Mut10 0.0320
## # ℹ 137,538 more rows

tr = phylotime(chr_mat_imputed,
               mut_rate = mut_rate_tb$rate,
               recur_vec = make_recur_vec_list(allele_prob_tb),
               t_total = 15. - 0.6)

Now to visualize the reconstructed tree with the lineage barcodes:

plot_barcodes(chr_mat, tr, show_column_names = F)

To plot the reconstrcuted tree only:

plot_tr(tr)

Running ICE-FASE

Next, to apply ICE-FASE, we need, in additional, the terminal cell state classifications. This is provided as a named vector, where the name correspond to cell names. We can visualzie the tree with terminal cell states:

cell_type_tb = read_csv("../data/example/cell_type.csv")
sc_celltypes = cell_type_tb$type
names(sc_celltypes) = cell_type_tb$cell
plot_barcodes(chr_mat[, 1:50], tr, tip_celltype = sc_celltypes, show_column_names = F)

To run ICE-FASE: (‘root_time’ is the time until barcode activation.)

res = ice_fase(tr, sc_celltypes, total_time = 15 - 0.6, root_time = 0.6)

ICE-FASE results

With ICE-FASE fitted, a number of progenitor states are identified as nodes in the fate map topology. We can plot the reconstructed fate map topology with commitment times now. Note that the inferred progenitor states (iP) are denoted as “Node-x” below.

plot_topology(gr = res$gr,
              total_time = 15,
              gr_node_time = res$gr_trans_time,
              type_col = res$col_pal,
              show_node_label = T)

To visualize the internal node state assignment in reconstructed phylogeny:

plot_tr(res$tr,
        node_types = res$tr_node_assign,
        type_col = res$col_pal)

To visualize distribution of Inferred Commitment Event (ICE) times:

plot_ice_times(res)

To visualize progenitor population sizes:

plot_node_sizes(res)

Finally, we can get a summary table for the progenitor state estimates:

output_estimates(res)

## # A tibble: 15 × 9
##    progenitor_state commitment_time population_size downstream_state1
##    <chr>                      <dbl>           <dbl> <chr>            
##  1 Node-1                      2.57              10 Node-2           
##  2 Node-2                      4.92              47 Node-6           
##  3 Node-3                      3.85              20 Node-4           
##  4 Node-4                      7.57             128 Node-11          
##  5 Node-5                      6.61              81 Type15           
##  6 Node-6                      8.38             174 Node-13          
##  7 Node-7                      6.54             106 Node-8           
##  8 Node-8                      8.69             100 Type6            
##  9 Node-9                      7.85             105 Type8            
## 10 Node-10                     9.07             101 Type14           
## 11 Node-11                     9.53              90 Type12           
## 12 Node-12                     7.94             105 Type10           
## 13 Node-13                     9.76             112 Type2            
## 14 Node-14                     9.50             145 Type3            
## 15 Node-15                     9.85             110 Type16           
## # ℹ 5 more variables: downstream_size1 <dbl>, downstream_state2 <chr>,
## #   downstream_size2 <dbl>, commitment_bias1 <dbl>, commitment_bias2 <dbl>

Weixiang Fang

Read example data

Imputing missing data

Running Phylotime

Running ICE-FASE

ICE-FASE results