do_pipeline.RdAnalyse the output of polyApipe.py. The resulting directory can be loaded
with load_banquet. An HTML report is also produced.
do_pipeline( out_path, counts_file_dir = NULL, counts_files = NULL, batch_names = "", peak_info_file, organism, cell_name_func = function(batch, cell) paste0(batch, cell), cells_to_use = NULL, remove_mispriming = TRUE, utr_or_extension_only = FALSE, peak_min_present = 50, peak_min_prop = 0.01, do_logNormCounts = TRUE, do_computeSumFactors = TRUE, title = "polyApiper pipeline run", stages = seq_along(PIPELINE_STAGES) )
| out_path | Output directory name. |
|---|---|
| counts_file_dir | A directory containing counts files. Batch names are taken from the basename of the count files without the .tab.gz suffix. Give either this argument or |
| counts_files | Alternative to counts_file_dir. One or more filenames for .tab.gz files produced by polyApipe.py. Give either this argument or |
| batch_names | If using counts_files, give a vector of the batch/sample names, in same order as counts_files |
| peak_info_file | GTF formatted peak file as output from polyApipe.py. |
| organism | Organism directory, as created by |
| cells_to_use | Character vector of cells to use. |
| remove_mispriming | Remove peaks considered to be mispriming peaks. |
| utr_or_extension_only | Remove peaks in the 5'UTR, exons, or introns. |
| peak_min_present | A peak is retained if it is present in this number of cells. |
| peak_min_prop | A peak is used for APA and gene expression calculations only if it constitutes at least this proportion of total UMIs for the gene. |
| do_logNormCounts | Compute logcounts for gene and peak expression? |
| do_computeSumFactors | When computing logcounts, use scran::computeSumFactors? If not, unadjusted cell library sizes are used. |
| title | Title to use in report. |
| stages | Stages of the pipeline to run. Stages are listed in PIPELINE_STAGES. |
There is no return value. Results are placed in the out_path directory. They can be loaded using load_banquet(out_path).
You can choose to run only specific stages of the pipeline with the stages argument. See the global PIPELINE_STAGES for possible stages. stages is a character vector or an integer vector indexing into PIPELINE_STAGES.