I was recently asked to describe what my typical workflow would look for running R on a cloud machine, such as digitalocean.
So, here’s a typical use:
## Create digitalocean machine docker-machine create --driver digitalocean --digitalocean-size 1gb --digitalocean-access-token $DO_TOKEN dev ## Point docker-engine at the the new machine eval $(docker-machine env dev) ## Launch the Rocker container running RStudio docker run -d -p 8787:8787 -e PASSWORD=$PASSWORD rocker/hadleyverse ## Open browser at the IP address (e.g. mac-based terminal command) open https://$(docker-machine ip dev):8787
From there, login with username “rstudio” and password you chose (will default to rstudio if nothing was given) and you’re good to go. RStudio has a nice git GUI which is a good way to move content on and off the instance, but I would also teach
docker commit and
docker push at the end as a way to capture the whole runtime (e.g. any packages installed, cached binaries from knitr, etc):
docker commit <hash> username/myimage docker push username/myimage
(Using to the the private image slot if you don’t want your image & results public)
It would probably be good to cover the non-interactive case too, which is helpful when you want to do a really long run of something that needs a bigger computer than your laptop. IMHO this is where docker-machine excels because it’s easy to have the machine shut down when all is finished! e.g.
## As before docker-machine create --driver digitalocean --digitalocean-size 1gb --digitalocean-access-token $DO_TOKEN dev eval $(docker-machine env dev) ## Let's use the committed image since it can already have my script included docker run --name batch username/myimage Rscript myscript.R ## commit using container name & push back up docker commit batch username/myimage docker push username/myimage ## Script stops machine now docker-machine rm -f dev
Clearly that can be modified to pull scripts down with git and push results back up to github rather than using docker commit to snapshot and push the whole image every time. Just add env vars for authentication.
Provided runs are < 1 hr, this workflow can rather easily be applied to a continuous integration system such as Travis or Circle-CI, (which both support Docker), instead of using docker-machine to create a private cloud instance. This provides a very simple way to bring continuous integration testing and content (re)generation to individual research results.