Docker: Repeatability in R&D and Academia

Repeatability is a core consideration for any research environment. Whether for public knowledge or internal to a company a future researcher being able to re-produce your experiment is essential to the scientific method

Jan. 7, 2018

I have been tinkering with Docker in my spare time and have really come to see the power it brings to the table when working on software projects. So many times deploying to either my personal server or the Windows servers we have at work have resulted in errors based on the environment the code is running in or the build steps used to create it. Alongside being an extremely frustrating and stressful problem it is so demoralizing to complete a set of work and then to have doubts about whether or not it will actually work when put in production. Docker seems to solve this in such an eloquent and complete way and I am thoroughly enjoying it for my personal work. Just have to get the IT team at work to agree to let me use it there!

I was first really introduced to Docker when working on the Axelrod projects and their philosophy about it really convinced me to implement it from an operations perspective. Their use cases are, alongside operational, academic in nature. The primary reason they bring it up is to allow the repeatability of the experiment.

At work every experiment, the conditions it ran under, lab results, documentation, notes and conclusions are all meticulously documented in dedicated software tool. I can search "nylon crystallization" and it will come back with every experiment that contains those keywords. It is an essential tool for our company to continue being innovative and make progress in our field. From my work with the R&D teams at work who we ran our process experimentation with, as well as spending a lot of time in labs in college, repeatability is a core facet of experimentation. If we can repeat an experiment we can build on top of it and push forward.

The beauty of Docker is the ability to build an environment, use it locally and then literally take the exact same environment - not a copy, not a re-creation but the environment itself and then use it in a different location. On top of that even if you do need to re-create (i.e. build) the image, the Dockerfile can execute this and I have basically complete confidence that it will be an identical environment.

More and more of academia is moving into programming as a solution to some difficult problems and I believe based on talking to some of my friends who are PHD students doing research that this trend will only continue. Reproducing an experiment in this field can be difficult because of the large array of operating systems, versions and software installed on them. I am interested to see if Docker grows to solve that problem as it is growing to solve a similar issue in software operations.