xCellData.Rmd
Abstract
Instructions on how to obtain 489 cell type gene signatures from Aran et al., 2017.The xCellData package provides a R / Bioconductor resource for obtaining and representing 489 cell type gene signatures from (Aran, Hu, and Butte 2017).
This packages uses the unisets Sets
class to represent the collection of signatures. However, the data itself is distributed with the package as a GMT file, which may be parsed and imported by other packages (e.g. GSEABase GeneSetCollection
, GeneSet tbl_geneset
).
The script used to download and preprocess the data is distributed with the package. You can find it at the following location:
system.file(package = "xCellData", "scripts", "makeData.R")
## [1] "/home/travis/R/Library/xCellData/scripts/makeData.R"
Briefly, the script downloads “Additional file 3: The 489 cell type gene signatures. (XLSX 417 kb)” from the https://genomebiology.biomedcentral.com website and reformats the content of the published Microsoft Excel file into a GMT text file.
We use the xCellData()
function to parse the GMT file distributed with the package into a unisets Sets
object.
## Sets with 20803 relations between 5079 elements and 489 sets
## element set
## <character> <character>
## [1] C1QA aDC_HPCA_1
## [2] C1QB aDC_HPCA_1
## [3] CCL13 aDC_HPCA_1
## [4] CCL17 aDC_HPCA_1
## [5] CCL19 aDC_HPCA_1
## ... ... ...
## [20799] IL2RA Tregs_HPCA_3
## [20800] KCNA2 Tregs_HPCA_3
## [20801] LAIR2 Tregs_HPCA_3
## [20802] MCF2L2 Tregs_HPCA_3
## [20803] RGS1 Tregs_HPCA_3
## -----------
## elementInfo: IdVector with 0 metadata
## setInfo: IdVector with 1 metadata (source)
The signatures may then be used for downstream analyses such as cell type annotation.
For instance, the Sets
object can be split into a list of signatures, for use in functions such as lapply
.
as.list(xsig)
## List of length 489
## names(489): Adipocytes_ENCODE_1 ... pro B-cells_NOVERSHTERN_3
One may also inspect the number of genes in each signature.
dat <- setLengths(xsig) hist( dat, breaks = 100, xlim=c(0, max(dat)), main = "Distribution of signature sizes", xlab = "Number of genes" )
Example of packages using xCellData include:
## R Under development (unstable) (2020-07-13 r78833)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.6 LTS
##
## Matrix products: default
## BLAS: /home/travis/R-bin/lib/R/lib/libRblas.so
## LAPACK: /home/travis/R-bin/lib/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] unisets_0.99.0 S4Vectors_0.27.9 BiocGenerics_0.35.2
## [4] xCellData_0.0.1 BiocStyle_2.17.0
##
## loaded via a namespace (and not attached):
## [1] SummarizedExperiment_1.19.4 xfun_0.15
## [3] reshape2_1.4.4 lattice_0.20-41
## [5] vctrs_0.3.0 htmltools_0.5.0
## [7] rtracklayer_1.49.2 yaml_2.2.1
## [9] blob_1.2.1 XML_3.99-0.3
## [11] rlang_0.4.7 pkgdown_1.5.1
## [13] DBI_1.1.0 BiocParallel_1.23.0
## [15] bit64_0.9-7 matrixStats_0.56.0
## [17] GenomeInfoDbData_1.2.3 plyr_1.8.6
## [19] stringr_1.4.0 zlibbioc_1.35.0
## [21] Biostrings_2.57.1 memoise_1.1.0
## [23] evaluate_0.14 Biobase_2.49.0
## [25] knitr_1.29 IRanges_2.23.6
## [27] GenomeInfoDb_1.25.0 AnnotationDbi_1.51.0
## [29] GSEABase_1.51.1 Rcpp_1.0.4.6
## [31] xtable_1.8-4 backports_1.1.8
## [33] BiocManager_1.30.10 DelayedArray_0.15.1
## [35] desc_1.2.0 graph_1.67.1
## [37] annotate_1.67.0 XVector_0.29.1
## [39] fs_1.4.1 bit_1.1-15.2
## [41] Rsamtools_2.5.1 digest_0.6.25
## [43] stringi_1.4.6 bookdown_0.20
## [45] GenomicRanges_1.41.1 rprojroot_1.3-2
## [47] grid_4.1.0 tools_4.1.0
## [49] bitops_1.0-6 magrittr_1.5
## [51] RCurl_1.98-1.2 RSQLite_2.2.0
## [53] crayon_1.3.4 MASS_7.3-51.6
## [55] Matrix_1.2-18 assertthat_0.2.1
## [57] rmarkdown_2.3 R6_2.4.1
## [59] GenomicAlignments_1.25.1 compiler_4.1.0
Aran, Dvir, Zicheng Hu, and Atul J. Butte. 2017. “XCell: Digitally Portraying the Tissue Cellular Heterogeneity Landscape.” Genome Biology 18 (1): 220. https://doi.org/10.1186/s13059-017-1349-1.