Considerate contributions are very welcome!
Here are a few guidelines to help develop and maintain a consistent coding style.
Primary objectives
Most importantly, contributions should directly contribute to the primary objectives of the package, namely:
- Apply signatures to assign cell identities to new data sets
- Learn new signatures from data sets, in a format compatible with (1.)
Existing methods implemented in other R packages are welcome. Those should be handled as described in the New prediction methods section and unit tested as described in the Unit tests and code coverage section.
Proof of concepts
Ideally, a proof-of-concept Rmarkdown notebook should demonstrate the method before adding any new function in hancock
(i.e., explicitly declaring any new function in the notebook itself).
This can save significant time through community feedback and suggestions from both expert developers and prospective users on the implementation and expected usage of a new functionality before investing significant time and effort into packaging and documenting functions.
For an example, please refer to the proof-of-concept of the function predictByProportionPositive
available here.
Proof-of-concept vignettes may be subsequently updated to demonstrate the same use case, but calling functions implemented in the package. Refer to the demonstration of the function learnMarkersByPositiveProportionDifference
available here.
Unit tests and code coverage
Code coverage should remain as close as possible to 100%. Every function, both internal and exported, should be accompanied with its own unit test(s) as part of the same pull request.
A single unit test may include multiple expect_*
assertions. Use as many expect_*
as appropriate.
Note that large functions that include several if
statements and require multiple unit tests to cover every scenario can generally be refactored in multiple smaller functions easier to unit test individually.
The following code is useful to track down lines that are not covered by unit tests:
library(covr)
pc <- package_coverage()
report(pc)
NEWS file
Until the package is made available through the Bioconductor project, all new features should be described under the “hancock 0.99.0” section, as part of the same pull request. This will produce a manifest of functionality to accompany the initial submission to the Bioconductor project.
Internal functions
Internal functions should also be documented using roxygen comments (http://r-pkgs.had.co.nz/man.html) using Markdown formatting (https://cran.r-project.org/web/packages/roxygen2/vignettes/markdown.html). However, those do not have to be as comprehensive as exported functions. Required sections are:
@title
@description
@param
@return
-
@rdname INTERNAL_<...>
with <...>
being the name of the function (without any trailing “.”). Make sure that your .gitignore
contains the entry INTERNAL_*
. Do not push INTERNAL documentation online.
@author
New prediction methods
New prediction methods should be first implemented as a separate functions, individually exported in the NAMESPACE
file. All prediction methods must accept object
and se
as their first two arguments, respectively:
- the
GeneSetCollection
or Sets
- the
SummarizedExperiment
Additional, method-specific parameters may be accepted from the third argument onward.
Once implemented as its own function, a new method should be made available through the .predictAnyGeneSetClass
function using a unique method
identifier. Make sure the new identifier and method are documented in the ?predictSignatures
man page.
Prediction methods should return the input SummarizedExperiment
object updated as follows:
- In the
colData
slot, a DataFrame
nested in a new (or updated) "hancock"
column should contain at least a first column called prediction
. This column should be populated with the highest-confidence prediction for each sample. Additional, method-specific columns may be present from the second column onward.
- In the
metadata
slot, a list
in a new (or updated) "hancock"
element, should contain at least the following elements:
-
"GeneSets"
: the object of class GeneSetCollection
or Sets
containing the signatures used to make the predictions.
-
"method"
: Identifier of the method used to make the predictions
-
"packageVersion"
: Version of the hancock
package used to make the predictions
- Additional, method-specific elements may appear after the above general metadata
For an example template, please refer to the prediction method predictByProportionPositive
, made available using the "ProportionPositive"
or "PP"
identifiers.
New plotting functions
New plotting functions should accept se
as their first argument, namely a SummarizedExperiment
returned by any prediction method (see above).
Most importantly, plotting function should first check that the input se
object contains the results of the associated prediction method(s). In the future, this requirement may be deprecated by the definition of SummarizedExperiment
subclasses, “flagging” the presence of specific prediction results.
Plotting functions should return a minimal ggplot2::ggplot
or ComplexHeatmap::Heatmap
object, giving users maximal freedom to customize the plot.
For an example, please refer to plotProportionPositive
, using the result of the "ProportionPositive"
or "PP"
method.
New learning methods
Similarly to new prediction methods, new learning methods should be first implemented as a separate functions, individually exported in the NAMESPACE
file. All prediction methods must accept se
as their first argument, namely the SummarizedExperiment
from which to learn signatures. Additional method-specific parameters may be accepted from the second argument onward.
Once implemented as its own function, a new method should be made available through the learnSignatures
method using a unique method
identifier. Make sure the new identifier and method are documented in the ?learnSignatures
man page.
Learning methods should return an object inheriting from the BaseSets
class, defined in the unisets package.
For an example template, please refer to the prediction method learnMarkersByPositiveProportionDifference
, made available using the "PositiveProportionDifference"
or "PPD"
identifiers.
Metadata produced by learning methods may be stored as additional columns of the BaseSets
returned.
Terminology
Until the Cell Ontology or Human Cell Atlas come up with some reference terminology, avoid the use of “cell type” and “(sub-)ntypes” in the code and accompanying documentation. Those terms are increasingly confusing and open for interpretation as single-cell technologies continuously advance our understanding of cell differentiation into functionally distinct cell populations or compartments currently discriminated by their respective canonical set of cell surface proteins and transcriptional profiles (a few examples of terms that more specifically address individual aspects of the definition of “cell types”).