load_peaks_counts_dir_into_sce.Rd
Load a directory's worth of 3-column counts.tab.gz files from polyApipe.py into a SingleCellExperiment for analysis. Each cell will be uniquely named, and colData will include the batch. The 'batch_regex' can be used to extract nice sample names from filenames.
load_peaks_counts_dir_into_sce( counts_file_dir, peak_info_file, output, batch_regex = "(.*)\\.tab\\.gz", ... )
counts_file_dir | The directory full of counts files (in .tab.gz) as output from polyApipe.py script (usually called <something>_counts) |
---|---|
peak_info_file | GTF formatted peak file _specfically_ as output from polyApipe.py |
output | Where to save the sce object on disk using saveHDF5SummarizedExperiment. Should not exist. This directory will be created. |
batch_regex | A string of a regex to extract a nice sample name from a the counts files in counts_file_dir . This will become the batch name (and prefix of cell name) for each cell in colData. If not specified this will just use the filename up to but excludeing '.tab.gz'. The regex should have one capture group '()' and pull out something unique for every file. (Default = '(.*)\.tab\.gz') |
... | Other parameters passed through to load_peaks_counts_files_into_sce/load_peaks_counts_into_sce |
Other peak counts loading functions:
load_peaks_counts_files_into_sce()
,
load_peaks_counts_into_sce()
counts_dir <- dirname(system.file("extdata", "demo_dataset/demo_counts/SRR5259354_demo.tab.gz", package = "polyApiper")) peak_info_file <- system.file("extdata", "demo_dataset/demo_polyA_peaks.gff", package = "polyApiper") if (FALSE) { sce <- load_peaks_counts_dir_into_sce(counts_dir, peak_info_file = peak_info_file, output = "demodirsce", min_reads_per_barcode=1) # Required for tiny demo data. sce <- load_peaks_counts_dir_into_sce("myexpr_counts/", "my_expr_polyA_peaks.gff", "./myexpr_sce") sce <- load_peaks_counts_dir_into_sce("data/MyExperimentFullRun20190516_counts/", peak_info_file = "data/MyExperimentFullRun20190516_polyA_peaks.gff", output = "data/MyExperimentFullRun20190516_sce", batch_regex = 'FullRun_(.*)\\.tab\\.gz' ) }