class: center, middle, inverse, title-slide # Bioconductor toolkit for single-cell RNA velocity
## Bioconductor Conference 2021 ###
Kevin Rue-Albrecht
,
Charlotte Soneson
,
Michael Stadler
,
Aaron Lun
### 2021-08-06 (updated: 2021-07-21) --- layout: true <div class="my-header"><img src="img/velociraptor_sticker.png" alt="logo" align="right" height="90%"></div> <div class="my-footer"><span> Kevin Rue-Albrecht                  velociraptor </span></div> --- # RNA velocity predicts the future state of cells <img src="img/lamanno2018_fig3.png" width="2849" /> .right[ .small-p[ RNA velocity field describes fate decisions of major neural lineages in the hippocampus. <a name=cite-lamanno2018></a>([La Manno, et al., 2018](https://doi.org/10.1038/s41586-018-0414-6)) ] ] --- # RNA velocity from spliced and unspliced mRNAs <img src="img/lamanno2018_fig1.png" width="2140" /> .right[ .small-p[ Balance between unspliced and spliced mRNAs is predictive of cellular state progression. ([La Manno, et al., 2018](https://doi.org/10.1038/s41586-018-0414-6)) ] ] --- # scVelo * Python package + [Read the Docs](https://scvelo.readthedocs.io/). * Estimates RNA velocity from estimates of gene-wise spliced and unspliced abundances. * Steady-state (deterministic), stochastic, dynamic models. + [Read the Docs](https://scvelo.readthedocs.io/about/#rna-velocity-models) * Uses the <i class="fab fa-python"></i> `AnnData` class for internal data representation + [Read the Docs](https://anndata.readthedocs.io/en/latest/) * Not straightforward to integrate in an R/Bioconductor-based workflow. .right[ .small-p[ Generalizing RNA velocity to transient cell states through dynamical modeling <a name=cite-bergen2020></a>([Bergen, et al., 2020](https://www.ncbi.nlm.nih.gov/pubmed/32747759)) ] ] --- # The Bioconductor factor ![](img/distracted-analyst.jpg)<!-- --> --- # <img src="img/velociraptor_sticker.png" height="70px" style="vertical-align:bottom"> is a Bioconductor-friendly wrapper of <i class="fab fa-python"></i> scVelo <img src="img/bioconductor_sticker.png" height="30px" style="vertical-align:bottom"> *[velociraptor](https://bioconductor.org/packages/3.14/velociraptor)* uses <img src="img/bioconductor_sticker.png" height="30px" style="vertical-align:bottom"> *[basilisk](https://bioconductor.org/packages/3.14/basilisk)* to run <i class="fab fa-python"></i> [scVelo](https://pypi.org/project/scvelo/) in a `BasiliskEnvironment` (i.e., a <img src="img/conda.png" height="30px" style="vertical-align:bottom"> [Conda](https://docs.conda.io/en/latest/) environment). ```r setMethod("scvelo", "SummarizedExperiment", function(x, ..., assay.X="counts", assay.spliced="spliced", assay.unspliced="unspliced") { ... output <- basiliskRun(env = velo.env, fun = .run_scvelo, X = X, spliced = spliced, unspliced = unspliced, use.theirs = use.theirs, mode = mode, scvelo.params = scvelo.params, dimred = dimred) ... } ``` `velociraptor::scvelo()` returns a `SingleCellExperiment` object containing the output of the velocity calculations. --- # Input and data representation * The input to <i class="fab fa-python"></i> [scVelo](https://pypi.org/project/scvelo/) is two gene-by-cell count matrices, tabulating the UMI count for spliced and unspliced variants of each gene. * Several tools exist to estimate these counts from the raw reads <a name=cite-soneson2021></a>([Soneson, et al., 2021](https://www.ncbi.nlm.nih.gov/pubmed/33428615)). * Count matrices are stored as assays in a `SummarizedExperiment` object. * The <img src="img/bioconductor_sticker.png" height="30px" style="vertical-align:bottom"> *[zellkonverter](https://bioconductor.org/packages/3.14/zellkonverter)* package takes care of conversion between the `AnnData` and `SingleCellExperiment` formats. .small-code[ ```r hermann ``` ``` ## class: SingleCellExperiment ## dim: 54448 1711 ## metadata(0): ## assays(3): spliced unspliced logcounts ## rownames(54448): ENSMUSG00000102693.1 ENSMUSG00000064842.1 ... ## ENSMUSG00000064369.1 ENSMUSG00000064372.1 ## rowData names(0): ## colnames(1711): TGACAACAGGACAGAA TTGGAACAGGCGTACA ... TCGCGTTCAAGAGTCG ## CACCTTGCAGATCGGA ## colData names(2): celltype sizeFactor ## reducedDimNames(2): PCA TSNE ## mainExpName: NULL ## altExpNames(0): ``` ] --- # Running velociraptor::scvelo() Use gene selection, normalization and dimension reduction from <i class="fab fa-python"></i> [scVelo](https://pypi.org/project/scvelo/), or customize each step in <i class="fab fa-r-project"></i> for full compatibility with the rest of the workflow. .small-code[ ```r hermann_velo <- velociraptor::scvelo(hermann, subset.row = top.hvgs, assay.X = "spliced", use.dimred = "PCA") hermann_velo ``` ``` ## class: SingleCellExperiment ## dim: 2000 1711 ## metadata(4): neighbors velocity_params velocity_graph ## velocity_graph_neg ## assays(6): X spliced ... Mu velocity ## rownames(2000): ENSMUSG00000038015.6 ENSMUSG00000022501.6 ... ## ENSMUSG00000115935.1 ENSMUSG00000029725.10 ## rowData names(3): velocity_gamma velocity_r2 velocity_genes ## colnames(1711): TGACAACAGGACAGAA TTGGAACAGGCGTACA ... TCGCGTTCAAGAGTCG ## CACCTTGCAGATCGGA ## colData names(7): velocity_self_transition root_cells ... ## velocity_confidence velocity_confidence_transition ## reducedDimNames(1): X_pca ## mainExpName: NULL ## altExpNames(0): ``` ```r reducedDim(hermann_velo, "TSNE") <- reducedDim(hermann, "TSNE") hermann_velo$celltype <- hermann$celltype ``` ] --- # Project velocities onto low-dimensional embedding ```r embedded <- embedVelocity(reducedDim(hermann, "TSNE"), hermann_velo) grid.df <- gridVectors(reducedDim(hermann, "TSNE"), embedded) plotTSNE(hermann, colour_by="celltype") + geom_segment(data = grid.df, mapping = aes(x = start.1, y = start.2, xend = end.1, yend = end.2), arrow = arrow(length = unit(0.05, "inches"))) ``` <img src="index_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> --- # Gene-wise visualizations ```r plotVelocity(hermann_velo, c("ENSMUSG00000032601.13"), use.dimred = "TSNE", color_by = "celltype") ``` <img src="index_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> --- # Stream lines ```r plotVelocityStream(hermann_velo, embedded, use.dimred = "TSNE", color.streamlines = TRUE, color_by = "velocity_pseudotime") ``` <img src="index_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" /> --- # Processing a dataset for RNA velocity analysis * The Hermann spermatogenesis data set used in these slides is available from the <img src="img/bioconductor_sticker.png" height="30px" style="vertical-align:bottom"> *[scRNAseq](https://bioconductor.org/packages/3.14/scRNAseq)* package. * The script detailing the pre-processing is available [here](https://github.com/LTLA/scRNAseq/blob/master/inst/scripts/2.4.0/make-hermann-spermatogenesis-data.Rmd) <sup>1</sup> * The source code for this presentation is available [here](https://github.com/kevinrue/velociraptor_talk_bioc2021) <sup>2</sup> .center[ <img src="img/scrnaseq-github.png" height="300px"> ] .small-p[ <sup>1</sup> Long URL: <https://github.com/LTLA/scRNAseq/blob/master/inst/scripts/2.4.0/make-hermann-spermatogenesis-data.Rmd> <sup>2</sup> Long URL: <https://github.com/kevinrue/velociraptor_talk_bioc2021> ] --- # References .small-p[ <a name=bib-bergen2020></a>[Bergen, V. et al.](#cite-bergen2020) (2020). "Generalizing RNA velocity to transient cell states through dynamical modeling". In: _Nat Biotechnol_ 38.12, pp. 1408-1414. ISSN: 1546-1696 (Electronic) 1087-0156 (Linking). DOI: [10.1038/s41587-020-0591-3](https://doi.org/10.1038%2Fs41587-020-0591-3). URL: [https://www.ncbi.nlm.nih.gov/pubmed/32747759](https://www.ncbi.nlm.nih.gov/pubmed/32747759). <a name=bib-orchestrating2015></a>[Huber, W. et al.](#cite-orchestrating2015) (2015). "Orchestrating high-throughput genomic analysis with Bioconductor". In: _Nat Methods_ 12.2, pp. 115-21. ISSN: 1548-7105 (Electronic) 1548-7091 (Linking). DOI: [10.1038/nmeth.3252](https://doi.org/10.1038%2Fnmeth.3252). URL: [https://www.ncbi.nlm.nih.gov/pubmed/25633503](https://www.ncbi.nlm.nih.gov/pubmed/25633503). <a name=bib-lamanno2018></a>[La Manno, G. et al.](#cite-lamanno2018) (2018). "RNA velocity of single cells". In: _Nature_ 560.7719, pp. 494-498. DOI: [10.1038/s41586-018-0414-6](https://doi.org/10.1038%2Fs41586-018-0414-6). URL: [https://doi.org/10.1038/s41586-018-0414-6](https://doi.org/10.1038/s41586-018-0414-6). <a name=bib-soneson2021></a>[Soneson, C. et al.](#cite-soneson2021) (2021). "Preprocessing choices affect RNA velocity results for droplet scRNA-seq data". In: _PLoS Comput Biol_ 17.1, p. e1008585. ISSN: 1553-7358 (Electronic) 1553-734X (Linking). DOI: [10.1371/journal.pcbi.1008585](https://doi.org/10.1371%2Fjournal.pcbi.1008585). URL: [https://www.ncbi.nlm.nih.gov/pubmed/33428615](https://www.ncbi.nlm.nih.gov/pubmed/33428615). ] .center[ <img src="img/bioconductor_sticker.png" height="300px"> <img src="img/velociraptor_sticker.png" height="300px"> ]