mbp-banner

Contributing

There are many places for contribution the most obvious ones are help with documentations, help in the user's group and of course with the source itself.

Documentations

I'm using mkdocs to generate this site, which has been very easy to use. All documentations are written in plain markdown and located in main repo docs/ directory. You can simply fork RNAsik repository, do changes to the docs and send me a pull request (PR). Any changes are super welcomed, even one letter spell correction (there'll be more than one), but all changes need to come through PR, which will not only acknowledge you as contributor, but also enable me to review changes quickly and incorporate them in (pull them in) easily.

Quick notes on mkdocs, it is pretty easy to install with pip in virtualenv if you prefer (you should).

virtualenv mkdocs_env
source mkdocs_env/bin/activate
pip install mkdocs
mkdocs build
mkdocs gh-deploy

This will deploy your copy of RNAsik docs to your github-pages (gh-pages) You actually don't need to do that, you don't need deploy your own copy of the docs to your branch. Just use mkdocs server (read below) to prerview changes and send them through to me.

git clone https://github.com/MonashBioinformaticsPlatform/RNAsik-pipe
cd RNAsik-pipe
# do docs changes
mkdocs server

This will give you live updates to you copy of the docs, default URL should be localhost:8000, but it will tell you that once you've started the server. Then simply use your favourite text editor to edit markdown documents. Commit your changes, don't be afraid to be verbose, say what you've added/changed/removed in your commit message. And send me PR.

Developing pipeline further

I need to write a more comprehensive developer guide at sometime soon. Any contributions are again extremely welcomed and again as I've mentioned in the documentations section above, any contributions need to come through pull request (PR).

To summarise briefly layouts of the src/:

Building conda package

First of all you need to install (mini)conda.

I run it like this

bash Miniconda3-latest-Linux-x86_64.sh -b -p ~/.miniconda

These are fairly routine steps, but if this is your first time you'll need to do them

conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
conda install conda-build anaconda-client

Note that you can use -y flag to say assume yes instead of manually entering yes/no

git clone  https://github.com/serine/bioconda-recipes
cd bioconda-recipes
conda build recipes/rnasik
conda build recipes/rnasik --output
conda install -y --use-local rnasik

Once you've logged in once, anaconda will store login token somewhere in your home directory

~/.continuum/anaconda-client

RNAsik conda environment

Since version 1.5.4 RNAsik conda package only contains RNAsik pipeline without any additional bioinformatics tools. This is because I was having issues building a new version of the conda package. I'm guessing this is to do with the fact that those other bioinformatics tools like samtools etc aren't true dependencies, although I was specifing them under "requirements: run: " the build was spending too much time in "solving environment". Instead I exported full environmnet and how that on github such that use can simply grab that yaml and re-create RNAsik env

This is how to create new environment yaml config file

cat > tools.txt

rnasik
bigdatascript
fastqc
multiqc
bedtools
star==2.7.2b
subread
samtools
picard
bwa
skewer
je-suite
qualimap

ctrl^D
while read t; do conda install -y $t ;done < tools.txt

I know that you can give all of those tools on command line all at once, but this for some reason gets stuck in "solving environment" as well

conda env export > rnasik-1.5.4.yaml

Optionally you can initiate empty - clean environment before installing all of the packages with

conda create --name rnasik-1.5.4
conda activate rnasik-1.5.4

And then install and export. This additional step is probably better approach, since you are creating clean RNAsik environment

Travis CI and testing

Continues integration is very useful to ensure your code is checked continiouslly. RNAsik code is checked (tested) with every commit. However that testing only as good as I, or hopefully we, will make it. BigDataScript provides very nice unit testing mechanism, the trick of course it gotta to be written. Currently only very small proportion of the code is actually covered by tests. A lot of work is needed in this space. Of course one might say that I should have been writing tests as I was writing my code. Perhaps, but I'm new to this and better later then never!

Have a look at bds docs on how to write tests.