Notes

Berkeley Collaborative Environment

Trying out the Berkeley image (Essentially ubuntu 14.04 XFCE with ipython and RStudio installed, but finely tuned to improve user experience; e.g. solid colors for faster remote window connections.)

Highly recommend their paper for the first explanation I’ve actually been able to follow that provides a definition of “DevOpts” (essentially, using scripts rather than documentation to manage consistent cross-platform installation) and explains the differences and similarities between the various programs operating in this sphere, sometimes as alternatives and simultaneously.

Virtual machines.

Their approach focuses on running on top of a complete virtual machine. Primarily considers Oracle’s virtualbox for local use, Amazon’s AMI for cloud use. (For an emerging alternative to the full virtual machine approach, they discuss Docker).

Configuration management (CM) tools:

Ansible (“playbooks”) used by the BCE to specify the complete software environment. Compares to: Chef (Ruby-based, “recipes”), Salt (Python based, “states”), and Puppet (“manifests”).

Provisioning tools:

Packer Used at build time to create a machine image for the VM. Packer can use Ansible/Chef/Salt/Puppet files to do this. Results in a nice AMI for Amazon web console or an image for the virtualbox GUI.
Vagrant Offers a different approach. Rather than the developer creating a VM image using Packer and Ansible script that is ready to run on virtualbox or Amazon, the end user installs vagrant instead of virtualbox. Vagrant also handles the job of Packer, in preparing an environment to run (on Vagrant’s virtual machine, rather than on virtualbox). (Note that conversely, Packer can create a Vagrant virtual machine just as easily as it can create the Oracle virtualbox or Amazon AMI). Vagrant feels a lot more native, as the user works within their familiar OS tools for editing, etc, while vagrant makes sure that the execution of the software happens in a controlled, identical environment.
Docker offers a more modular alternative to a full virtual machine, with performance that is more like running on ‘bare metal’, sharing the kernel of the native OS; though at the moment it requires that be a linux kernel. Docker is typically deployed using Vagrant, (though stand-alone setup is emerging).

From first glance, Vagrant sounds like a more elegant approach than having end users install Oracle’s virtual machine and then learn to live in a virtual window emulating the Ubuntu-XFCE desktop (particularly if those users are developers!). However, it seems that the BCE team found that Vagrant was harder for students to work with, since it required knowledge of the commandline.

The paper also provides a fabulous case study of a major scientific software project using the “DevOpts” philosophy but without any of these new and emerging tools, relying only on scripts, makefiles, and Linux distribution package managers.

Test drive impressions

Unfortunately, no luck getting virtualization running on my laptop (due to BIOS issues). Testing on Ubuntu desktop required a newer version of virtualbox than what’s in the 12.04 repos, but otherwise just worked. It’s straight forward to install additional needed software, and virtualbox gives the option of preserving the machine state on exit, presumably with the software. Not clear how that should be managed to avoid re-creating the problems that using a consistent image set out to avoid in the first place; perhaps requires re-provisioning a divergent image?

Nice experience testing out the Amazon machine image, but I think the workflow would be improved if it provided RStudio server for a more interactive interface.

Misc tasks

Working on reveal.js slides
RNeXML manuscript edits from Francois’s comments

Thoughts on namespaces (in R)

Chatting with Scott about namespace practices, thought I’d put some of this down.

I go back and forth on having a small namespace (particularly vs convenience functions). It is certainly easier to maintain, but from the user’s perspective I think it’s pretty easy to just ignore extra functions; as long as it’s well documented what functions they need to know to get started.

I guess if a user needs to know too many functions to do anything, than it becomes hard to keep track of. That’s partly why I started wrapping exposed functions. For instance, you can just use nexml_write all the time and never use add_characters, add_trees, etc, since nexml_write takes additional trees and characters as arguments.

But I like function calls to be as semantic as possible so the code is more self documenting. Sometimes add_characters(nex) is more self explanatory than nexml_write(nex, characters = characters).

The XML package really got me started on this. There’s usually like 4 or 5 ways to do the same thing. Sometimes that’s really annoying, but sometimes it helps write more transparent code or less verbose code (e.g. adding child nodes with addChildren vs passing as .children argument to newXMLNode – the former tends to be more semantic, the latter often more consise).

Code tricks: vim pandoc

Vim pandoc syntax highlighting

Recognize .Rmd as a pandoc-syntax file extension. In .vimrc do:

au BufRead,BufNewFile *.Rmd set filetype=pandoc

Enable syntax highlighting inside code blocks, by language. In vim session, do: PandocHighlight r. Unfortunately doesn’t recognize the default .Rmd format used by RStudio. This then enables syntax highlighting, folding, etc.

If I recall correctly, knitr’s default markdown syntax,

```{r}

was intended to be a valid markdown syntax. It seems Github Flavored markdown is happy to recognize this as markdown syntax for a code block in the R language, but while pandoc recognizes this as a code block, it does not recognize the language.

I believe Pandoc does recognize the format

```{.r options}

as the notation to specify the R language in this notation. I like this notation because it means that pandoc-aware syntax highlighters will highlight my code chunks as R code, which does not happen in the default syntax.

I realize I could define this as an input hook for knitr, but am reluctant to do so as it makes my code slightly less familiar/less portable to other users (e.g. RStudio expects and integrates with only the standard notation.)