Using Bioinformatics Software Through Singularity and Docker

Bastian Schiffthaler
4 min readJul 23, 2020

Modernize your bioinformatics workflows and save yourself the headache of managing complicated dependency stacks. All with a free side dish of reproducible research.

Output of the lolcow app used inside a Singularity image
Using Singularity is simple, especially for people who work on compute clusters. Shown here: Using Singularity to build and run the “lolcow” image.

Installing bioinformatics software is one of those things: sometimes it’s as simple as downloading a binary from the internet or typing conda install, other times it’s a long struggle with source builds, stacks of dependencies and compiler quirks. I should know, I’ve created software that falls into both these categories.

For people who are new to the world of quirky Linux software, it can be a very frustrating experience, trying for hours to install that toolkit and all its dependencies, only to fail due to obscure issues that would drive a seasoned sysadmin into madness. Fortunately, container technology has completely changed the landscape of software in the last few years. Why bother installing complex software when you can let others do the work?

If you don’t know what I’m talking about, you’re either living under a rock, or you are exactly the kind of person who this article is for. So let’s get started.

What’s a container?

This one is easy — as long as we don’t dive too deep. A container is a self-contained bundle of software, including often a complete OS, as well as all libraries and executables needed to run a given application. Ever used a virtual image on your computer? Well the containers I talk about are very similar to that, and in fact can be classified as a form of virtualization. A huge advantage over full virtualization as VMware or VirtualBox offer it, is that these containers have a direct path to the underlying system, with much less overhead and thus higher performance.

Docker? Singularity?

Docker and Singularity are both technologies that help you create, run and manage container images. I’m mentioning Docker, because it is by far the most used container engine, and most bioinformatics software you might want to install, is going to have a Docker container. That being said, Docker has a number of issues for the typical user that just wants to analyze their RNA-Seq data that they got from the sequencer.

It’s not exactly user friendly if you don’t have at least passable knowledge of computer architecture. Let me give you an example. Let’s say I want to use my salmon container to quantify a set of reads that I can find in my current working directory:

Now that’s a command line! In all seriousness, that command is going to be scary to someone who hasn’t used containers before. If you don’t know what it all means, that’s maybe a topic for another day. Just trust me that it takes care to manage access of the container to the file system, perform user and group matching, etc.

It gives root access to the underlying system. The admins in charge of your university’s cluster are never going to let you run Docker. Why, you ask? Being able to run a Docker container is essentially the same as having root access to the entire cloud of servers that container can run on. Here’s an example:

What does this do? It uses Docker to give everyone who is part of the `users` group complete access to the entire system.

Enter Singularity

Singularity is a way around the issues of Docker, and it’s fully cross-compatible with Docker to boot! It’s a rootless and daemonless container engine. You can create executables from various container platforms (such as Singularity’s own system, or Docker) and execute them without much fanfare. Did I mention that it also integrates well with HPC queuing systems such as SLURM? Your sysadmins will be all for that.

Let’s take the example from before:

  1. First, we build an executable from my Docker image:

This makes an executable file, that for all intents and purposes can be treated the same as any other executable on the system. E.g.:

Now you only need to do this once. Once the image is built, it behaves just like any other executable and can be used as needed. Let’s run that command from before:

Now that is clean. As for sources of containers, here are a few where you can probably find what you need:

  • Biocontainers — A huge collection of high quality images, but the size of these images in Singularity images will be prohibitve. ~370M for bwa… ouch
  • DockerHub — Most images are published to this hub, but you will have to do some manual searching and vetting of what you’re getting. I publish my container images here: https://hub.docker.com/u/bschiffthaler
  • singularity-hub — Same as DockerHub, with an explicit focus on Singularity recipes. Same caveats apply as for DockerHub

What’s next?

Next week we’ll get started replicating a full RNA-Seq experiment with these tools. Stay tuned!

--

--

Bastian Schiffthaler

Life Science/Genomics/Transcriptomics. PhD in plant molecular biology but please don’t ask me any plant questions