Introduction to Docker


An Introduction to Docker

posted by Mosaic Data Science

 

One of Mosaic’s data science consultants recently had the opportunity to get better acquainted with Docker. He was very impressed with how simple and powerful it was for the basic use case of predictive maintenance. He knows that others have been researching its capabilities for larger scale solutions, but his use seemed like a simple practical case that might be instructive for other first timers.

 

First – what is Docker? Here are a couple summary statements from the Docker website (docker.com):

  • “Docker is the world’s leading software containerization platform.”

  • “Docker containers wrap a piece of software in a complete filesystem that contains everything needed to run: code, runtime, system tools, system libraries – anything that can be installed on a server. This guarantees that the software will always run the same, regardless of its environment.”

 

With that basic outline in place, but with very little additional understanding, he wondered if Docker might help him. Our consultant needed to set up a Predictive Maintenance demo for an upcoming sales meeting. It requires various R libraries that are called via an R-to-Java bridge. It then relays the results through a web application. Previously he had only ever been able to successfully link this all up on Linux and each time that he had needed to dust it off, it had required some configuration work to make sure that all the right libraries are interacting properly. It hadn’t really had a permanent home, so it has been tough to get demos up and running quickly. Our team discussed setting up a new linux server for this, but it occurred to our consultant that this might be a case where Docker might be useful. If containers were all that they were made out to be, then he could deploy this on any machine for demonstrations, and still have the ability to deploy it out to a server – of any type, including AWS – if the team wanted to later, without needing to create a new deployment.
So to build a custom image (a running instance of which, would be a container) he needed to create a Dockerfile. This is basically a script in
which he went through all the steps that he would need to do, to set up his application on a new linux machine. Yet in this case, he was working in Windows. Basically, he just needed to tell Docker that he wanted to start with a base image and then build on to it. He had it pull a publicly available Ubuntu image (there are many others you can start with on DockerHub). All commands after that point were basically issued as though he was then working on that hypothetical ubuntu machine.

Here is his final file:


  • Line 1: Pull in a public ubuntu docker image as a starting point.
  • Lines 3-13: Install JDK and R, and set their environment variables.
  • Lines 15-24: Create and run R scripts to install necessary R packages and add to java library environment variable.
  • Lines 26-27: Import and unzip the Predictive Maintenance demo build archive.
  • Lines 29-31: Expose a port for the service and declare the command to use as the entry point when this image is run as a container.

 

He then used a utility within the docker toolbox, which he installed on his machine, to build an image from this file. Of course the first few lines take a while to run, while it downloads ubuntu, java, and R libraries, but he was pleased to see this time is only required once. It will cache anything that it can, so if (when) you need to change something and rebuild the image, it will reuse cached data until it reaches your change.

Once an image is built, you can use docker to run the image as a container. He only needed to run it once or twice to see another great feature of docker. The team’s demo can be run in a couple different ways and he configured the image to accept an argument to pass along to the java application to tell it which way to run. If he had created a complete virtual machine to run the demo then stopping the whole thing and bringing it back again would be slow – basically a reboot – or require someone to manually work within that VM to reconfigure things. However, using Docker, he can start up a new container with a different argument, instantly. To him it seemed as though his containerized application starts up just as fast as the actual java application running by itself, directly on his machine.

Although the following pictures, from the Docker website, clearly hide some of the magic, they do illustrate, basically, how this performance is achieved. The first picture shows how traditional virtual machines might run on a host machine.

 

This second picture shows how containers run on a host machine.

 

You can see that in the first picture, each virtual machine requires its own Guest OS to be loaded, whereas in the second picture, a container
environment shares the underlying OS of the host machine. In his mind this is something like using cygwin, instead of booting something like a virtualbox linux image, when you want to use grep or some other linux utility. The main difference here though is that the containers are using the actual linux libraries that came with his base ubuntu image – not some windows port. The end result is that his java application is using the same libraries as it would be on a true linux machine.

So where might we go from here? In his mind, Docker seems like it would be very useful for something where you would have 3 different
machines to update with each release. Each machine has slight differences that, at present, we need to manually adjust for during deployment.If we just built a new image for each release that comes all configured with the proper data loaded and property file values set, then we could simply copy that single file to each server and run it with Docker. Better yet, if we ever need to set up a new server, we would only need to install Docker, no need to setup postgresql or java or anything else, since that will all come in the docker image.

He only learned how to create and run a single container. The real power of Docker comes from the many associated tools out there for deploying containers for enterprise systems. Although Docker is built on open source technologies, the company behind it offers many optional, paid services for deploying and hosting containers. The same can be said of Amazon, Google, or many other cloud services. He believes there are also several open source alternatives for hosting your own. Many of these platforms offer features like load balancing, failover, etc, in which multiple parallel containers can be dynamically added or removed as needed. For purposes of this article, we’ll leave this bigger picture infrastructure to others among us that have been researching these.

Mosaic can bring these capabilities to your organization, Contact us Here and mention this blog post

Leave a comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*

four + 12 =