Identify Markers by Largest Difference of Detection Rate in Clusters

This function computes the detection rate of each feature in each cluster. For each cluster, it ranks all the features by decreasing difference between the detection rate in the target cluster, and the detection rate in all other clusters. The function can limit results up to n markers for each cluster.

learnMarkersByPositiveProportionDifference(se, cluster.col,
  assay.type = "counts", threshold = 0, n = Inf, min.diff = 0.1,
  min.prop = 0.1, diff.method = c("min", "mean", "median", "max"))

Arguments

se	An object of class inheriting from "`SummarizedExperiment`".
cluster.col	Name of a column in `colData(se)` that contains a factor indicating cluster membership for each column (i.e. sample) in `se`.
assay.type	A string specifying which assay values to use, e.g., `"counts"` or `"logcounts"`.
threshold	Value above which the marker is considered detected.
n	Maximal number of markers allowed for each signature.
min.diff	Minimal difference in detection rate between the target cluster and the summarized detection rate in any other cluster (in the range 0-1). See argument `diff.method`.
min.prop	Minimal proportion of samples in the target cluster where the combined set of markers is detected.
diff.method	Method to contrast the detection rate in the target cluster to that of all other clusters. See Details section.

Value

A collection of signatures as a "Sets" object.

Details

diff.method controls how the detection rate in all clusters other than the target one are summarized before comparison with the detection in the target cluster. It is possible to rank features using the minimal ("min"), "mean", "median", or maximal ("max") difference between the detection rate in the target cluster and those of all other clusters.

Examples

# Example data ----
library(SummarizedExperiment)
nsamples <- 100
u <- matrix(rpois(20000, 1), ncol=nsamples)
rownames(u) <- paste0("Gene", sprintf("%03d", seq_len(nrow(u))))
colnames(u) <- paste0("Cell", sprintf("%03d", seq_len(ncol(u))))
se <- SummarizedExperiment(assays=list(counts=u))

colData(se)[, "cluster"] <- factor(sample(head(LETTERS, 3), ncol(se), replace=TRUE))

# Example usage ----

baseset <- learnMarkersByPositiveProportionDifference(se, cluster.col="cluster")

relations(baseset)
#> Hits object with 27 hits and 2 metadata columns:
#>             from        to | ProportionPositive minDifferenceProportion
#>        <integer> <integer> |          <numeric>               <numeric>
#>    [1]         1         1 |           0.814815                0.267196
#>    [2]         2         1 |           0.777778                0.229391
#>    [3]         3         1 |           0.777778                0.158730
#>    [4]         4         1 |           0.814815                0.148148
#>    [5]         5         1 |           0.851852                0.142174
#>    ...       ...       ... .                ...                     ...
#>   [23]        23         3 |           0.806452                0.163594
#>   [24]        24         3 |           0.741935                0.149343
#>   [25]        25         3 |           0.741935                0.146697
#>   [26]        26         3 |           0.806452                0.139785
#>   [27]        27         3 |           0.741935                0.122888
#>   -------
#>   nLnode: 27 / nRnode: 3

Arguments

Value

Details

See also

Examples