do_pipeline.Rd
Analyse the output of polyApipe.py. The resulting directory can be loaded
with load_banquet
. An HTML report is also produced.
do_pipeline( out_path, counts_file_dir = NULL, counts_files = NULL, batch_names = "", peak_info_file, organism, cell_name_func = function(batch, cell) paste0(batch, cell), cells_to_use = NULL, remove_mispriming = TRUE, utr_or_extension_only = FALSE, peak_min_present = 50, peak_min_prop = 0.01, do_logNormCounts = TRUE, do_computeSumFactors = TRUE, title = "polyApiper pipeline run", stages = seq_along(PIPELINE_STAGES) )
out_path | Output directory name. |
---|---|
counts_file_dir | A directory containing counts files. Batch names are taken from the basename of the count files without the .tab.gz suffix. Give either this argument or |
counts_files | Alternative to counts_file_dir. One or more filenames for .tab.gz files produced by polyApipe.py. Give either this argument or |
batch_names | If using counts_files, give a vector of the batch/sample names, in same order as counts_files |
peak_info_file | GTF formatted peak file as output from polyApipe.py. |
organism | Organism directory, as created by |
cells_to_use | Character vector of cells to use. |
remove_mispriming | Remove peaks considered to be mispriming peaks. |
utr_or_extension_only | Remove peaks in the 5'UTR, exons, or introns. |
peak_min_present | A peak is retained if it is present in this number of cells. |
peak_min_prop | A peak is used for APA and gene expression calculations only if it constitutes at least this proportion of total UMIs for the gene. |
do_logNormCounts | Compute logcounts for gene and peak expression? |
do_computeSumFactors | When computing logcounts, use scran::computeSumFactors? If not, unadjusted cell library sizes are used. |
title | Title to use in report. |
stages | Stages of the pipeline to run. Stages are listed in PIPELINE_STAGES. |
There is no return value. Results are placed in the out_path
directory. They can be loaded using load_banquet(out_path)
.
You can choose to run only specific stages of the pipeline with the stages
argument. See the global PIPELINE_STAGES
for possible stages. stages
is a character vector or an integer vector indexing into PIPELINE_STAGES
.