Analyse the output of polyApipe.py — do

Analyse the output of polyApipe.py. The resulting directory can be loaded with load_banquet. An HTML report is also produced.

do_pipeline(
  out_path,
  counts_file_dir = NULL,
  counts_files = NULL,
  batch_names = "",
  peak_info_file,
  organism,
  cell_name_func = function(batch, cell) paste0(batch, cell),
  cells_to_use = NULL,
  remove_mispriming = TRUE,
  utr_or_extension_only = FALSE,
  peak_min_present = 50,
  peak_min_prop = 0.01,
  do_logNormCounts = TRUE,
  do_computeSumFactors = TRUE,
  title = "polyApiper pipeline run",
  stages = seq_along(PIPELINE_STAGES)
)

Arguments

out_path	Output directory name.
counts_file_dir	A directory containing counts files. Batch names are taken from the basename of the count files without the .tab.gz suffix. Give either this argument or `counts_files` but not both.
counts_files	Alternative to counts_file_dir. One or more filenames for .tab.gz files produced by polyApipe.py. Give either this argument or `counts_file_dir` but not both.
batch_names	If using counts_files, give a vector of the batch/sample names, in same order as counts_files
peak_info_file	GTF formatted peak file as output from polyApipe.py.
organism	Organism directory, as created by `do_ensembl_organism`.
cells_to_use	Character vector of cells to use.
remove_mispriming	Remove peaks considered to be mispriming peaks.
utr_or_extension_only	Remove peaks in the 5'UTR, exons, or introns.
peak_min_present	A peak is retained if it is present in this number of cells.
peak_min_prop	A peak is used for APA and gene expression calculations only if it constitutes at least this proportion of total UMIs for the gene.
do_logNormCounts	Compute logcounts for gene and peak expression?
do_computeSumFactors	When computing logcounts, use scran::computeSumFactors? If not, unadjusted cell library sizes are used.
title	Title to use in report.
stages	Stages of the pipeline to run. Stages are listed in PIPELINE_STAGES.

Value

There is no return value. Results are placed in the out_path directory. They can be loaded using load_banquet(out_path).

Details

You can choose to run only specific stages of the pipeline with the stages argument. See the global PIPELINE_STAGES for possible stages. stages is a character vector or an integer vector indexing into PIPELINE_STAGES.