Analyse the output of polyApipe.py. The resulting directory can be loaded with load_banquet. An HTML report is also produced.

do_pipeline(
  out_path,
  counts_file_dir = NULL,
  counts_files = NULL,
  batch_names = "",
  peak_info_file,
  organism,
  cell_name_func = function(batch, cell) paste0(batch, cell),
  cells_to_use = NULL,
  remove_mispriming = TRUE,
  utr_or_extension_only = FALSE,
  peak_min_present = 50,
  peak_min_prop = 0.01,
  do_logNormCounts = TRUE,
  do_computeSumFactors = TRUE,
  title = "polyApiper pipeline run",
  stages = seq_along(PIPELINE_STAGES)
)

Arguments

out_path

Output directory name.

counts_file_dir

A directory containing counts files. Batch names are taken from the basename of the count files without the .tab.gz suffix. Give either this argument or counts_files but not both.

counts_files

Alternative to counts_file_dir. One or more filenames for .tab.gz files produced by polyApipe.py. Give either this argument or counts_file_dir but not both.

batch_names

If using counts_files, give a vector of the batch/sample names, in same order as counts_files

peak_info_file

GTF formatted peak file as output from polyApipe.py.

organism

Organism directory, as created by do_ensembl_organism.

cells_to_use

Character vector of cells to use.

remove_mispriming

Remove peaks considered to be mispriming peaks.

utr_or_extension_only

Remove peaks in the 5'UTR, exons, or introns.

peak_min_present

A peak is retained if it is present in this number of cells.

peak_min_prop

A peak is used for APA and gene expression calculations only if it constitutes at least this proportion of total UMIs for the gene.

do_logNormCounts

Compute logcounts for gene and peak expression?

do_computeSumFactors

When computing logcounts, use scran::computeSumFactors? If not, unadjusted cell library sizes are used.

title

Title to use in report.

stages

Stages of the pipeline to run. Stages are listed in PIPELINE_STAGES.

Value

There is no return value. Results are placed in the out_path directory. They can be loaded using load_banquet(out_path).

Details

You can choose to run only specific stages of the pipeline with the stages argument. See the global PIPELINE_STAGES for possible stages. stages is a character vector or an integer vector indexing into PIPELINE_STAGES.