Lab Notebook

Entries

Learning Jekyll

30 Dec 2012

Some quick thoughts on learning Jekyll

After recieving a few queries on how to get started with Jekyll I thought I’d jot down my own opinions here. Of course there is already rather good documentation for the software and plenty of blogs providing tutorials. While no doubt more than adequate, I have more opinions on how to get started actually learning what’s going on than I could easily fit into a tweet, and also a couple favorite links to references so I thought I’d jot down my reply in long form here.

Brief Background

Jekyll is a fantastic and popular static blogging platform, written in Ruby by a Github founder, Tom Preston-Werner, which powers this site (at least currently). Similar static website generators exist for Python (Hyde) Haskell (yst), and probably others, but Jekyll is quite widespread.

With the (unofficial) slogan of the bloging platform for hackers, Jekyll is aimed at folks who don’t mind a little coding. If you’re looking for a polished, user-friendly product where a huge community of developers has written most imaginable extensions, check out Wordpress.org. If you want something where you can easily get into the guts and customize everything, Jekyll may be the best game in town. Compared to a dynamic site, a static site is much more lightweight, which translates into faster, more stable, and less expensive. What’s not to like?

Where to start?

A couple pre-configured Jekyll setups have emerged to reduce the learning curve, most notably Octopress and Jekyll Bootstrap. In my opinion, in an attempt to make things easier through tricks such as automated Ruby makefiles (Rakefiles) I think these projects have actually made Jekyll appear much more complicated than it really is, as evidenced by the number of talented programmers running Octopress with the identical configuration. They may get you started faster, but their use of advanced techniques to provide more functionality out-of-the-box makes for a tougher learning curve.

See the Jekyll-Bootstrap’s introduction to Jekyll for a nice explanation of how Jekyll is laid out. Understand the basic YAML headers, and the role of the _layout, _includes and _posts directories, and you’ve mastered the basics. Or read on below for my take on getting started.

Running Jekyll

For a basic site hosted on Github, one never actually has to run Jekyll locally – just push the site to a gh-pages branch of a Github repository, or to the master branch of a repository named username.github.com for a personal website, and Jekyll will be run by the server. Jekyll is frequently run locally instead, allowing the user to copy the static website it generates to any (static) web server (see Hosting options, below). Simply install Jekyll following the directions on its homepage README.[^1]

My rough guide to learn-by-doing

I believe the best way to start with Jekyll is start with a really bare-bones site. You add features as you need them, making maintenance easy.

Start with an empty repository. Create a markdown file, say, index.md. Write whatever you want in standard markdown, just stick a YAML header on the top. Stick what? From your standardized testing days, think something like markdown:html :: yaml:xml. Three - on their own line to start, three - on another line to close define the block of YAML data, whose mere existence is enough to get Jekyll to pay attention to that file. Here we specify any metadata we want.

    ---
    layout: default
    ---

    # My site title

    some text

And voilà, we have a functional Jekyll site. The basic thing to remember is that everything else is gravy, to be mastered as you have a need for it. The advanced features can be super helpful once you know them, but in the beginning they may just get in the way of understanding what’s actually going on.

Getting deeper: Jekyll site file structure

Directories that are functional for Jekyll but do not appear in the generated website start with an underscore _.

  • Root directory. You don’t need any of the standard directories for a barebones site. As we saw above, any markdown file with YAML header matter in the root directory will be parsed. Don’t like markdown? We could write these files in html (with .html extension) instead. Jekyll will parse them as long as they have some YAML header matter, allowing us to still use things like page templates and other tricks from Liquid code described below.

  • _posts Your posts live here. They should be titled in YYYY-MM-DD-post-title.md format, which will allow Jekyll to automatically figure out the post date and title, while also keeping the posts neatly organized in this directory. (The title and the publication date, optionally including time can alternatively be given in the YAML header matter if you don’t like this convention. See below).

  • _layout For a completely barebones site, you don’t need this directory. Here you can provide files defining the basic HTML layout, or template, for each of the pages in your website. If your familiar with HTML, you can go ahead and set up a page as you like, and then add the single line of Liquid code in the body to pull in the HTML generated by parsing the markdown files. The mimimal layout file might look something like this (see an HTML tutorial for details):

<!doctype html>
<html lang=en>
<head>
<meta charset=utf-8>
<title> {{ page.title }} </title>
</head>
<body>

{{ page.content }}

</body>
</html>

where the double braces and words page.title and page.content are Liquid code that will tell Jekyll to insert the title and content of the page in the appropriate spot. To use a layout, save this file in _layout with the desired layout name, e.g. plainlayout, and then it can be applied to any page (markdown or html file in the root directory) or post by adding layout: plainlayout to the YAML header of that file. This provides a super convenient way to create page templates for a static site.

  • _includes Often one might have several layouts corresponding to different kinds of pages on your site. To avoid copy-pasting the parts of the layout that stay the same (perhaps all have the same header matter pointing to your CSS files and analytics tracker), you can just place these snippets inside the _includes directory with their own filenames. For instance, we could move the header text to a file called header in that directory with the contents:
<!doctype html>
<html lang=en>
<head>
<meta charset=utf-8>
<title> {{ page.title }} </title>
</head>

To automatically reuse a snippet of HTML code in any file on your site (a layout file or other), we need to learn one more Liquid command. This one goes between brace-percent symbols, and uses the command include, followed by the filename, like so:

{% include header %}
<body>
{{ page.content }}
</body>
</html>

The above example provides a shorter layout file, which pulls in the header snippet. If you have only one layout, this is of course pointless, but the more you add to and change your site, the more useful the ability to reuse a single HTML chunk with just a line of code becomes.

Configuration

  • YAML and the _config.yml file. Recall that YAML is the very human-readable data structure, whose syntax is explained quite throughly on it’s wikipedia page. One glance at an example reveals almost all you need to know about the syntax, such as:

---
layout: post
category: ecology
tags: 
  - howto
  - jekyll
modified: 2013-01-07
---

Lists can be done in a variety of ways, comma-s

One of the Jekyll beginner’s mysteries is knowing just what words/metadata/variables are already available in Jekyll and how to add those that are not. A list appears on the Jekyll Wiki. Note that the syntax uses the ruby structure, so page.title and page.date are both part of the page. We can create any custom variable by adding it to that page’s yaml header. For instance, modified is not defined anywhere in Jekyll’s default Template Data, but none-the-less we can access this modification date in this page’s layout or the page itself using {{ page.modified }} liquid tag.

Metadata belonging to the site as a whole, rather than a particular page, is specified in the _config.yml file, along with a few options telling Jekyll how to parse the site. Again, the Jekyll wiki has a great overview. As the extension implies, this entire file is in YAML. If we want to create new global variables, such as our twitter account, we can just add them to the YAML like so:

author:
  twitter: cboettig
  github: cboettig

(note the indentation for hierarchy) and now my twitter username is available through the liquid command site.author.twitter. Again, it’s just another trick to insert some text, just like _layout and _includes. Only use can prove how helpful this can be.

Markdown parser hell. A lot of Jekyll adopters are already familar with Github and it’s flavor of markdown with such features as fenced codeblocks, thus it comes as real suprise to run into advice explaining how code blocks need to be wrapped in the never-heard-of “Liquid” syntax {% highlight %}. Ignore this nonsense. How your markdown is parsed depends on what flavor of markdown you use, and there is a special level of hell for anyone inventing a new flavor (Atwood 2012). Fortunately Github’s Tom Preston-Werner finally gave up the perfect irony that Github’s flavor was not one of the built-in Jekyll parsers (though being a hacker system, this could be added manually), and you can now get that behavior by setting your chosen parser to be redcarpet with the convenient extensions [^2]

markdown: redcarpet
redcarpet:
  extensions: ["no_intra_emphasis", "fenced_code_blocks", "autolink", "tables", "with_toc_data"]

More advanced tricks

Liquid Your next stop should really be to read Liquid for Designers. Liquid is the glue that specifies all the rules needed to assemble your site, and is quite simple. Understand how Liquid works and you’ve mastered Jekyll. Jekyll provides some additional Liquid functions common for blogs.

Your next stop is to get some really nice customizable CSS scaffolding to control the layout for your site. While there’s a huge number of free CSS themes available, this isn’t 2002, so head on over to Twitter-Bootstrap (a project originally created by some of the programmers at Twitter, now evolved to it’s own community). The CSS-based grid layout and some very nice JQuery Javascripts tools can make your site as pretty and as interactive as you can imagine. Like Octopress, prebuilt CSS can lead to all sites looking the same. Check out themes for Bootstrap, and the excellent set of icons by FontAwesome.

Your final step is where the hacking really begins. Now and again you imagine something you just can’t do with these tools. Your only recourse is to ask on Stackoverflow for someone to do it for you learn a bit of Ruby. Anything you can write in Ruby you can add to Jekyll as a “plugin”, by defining a Liquid extension for a Ruby function. Ruby is clearly a favorite language of web-developers thanks to its dynamic-site implementation Ruby-on-Rails and a vibrant Github community, so there’s a wealth of useful tools, including an implementation to most APIs you might care to interact with (Github, Twitter, various Google APIs, are among the ones I’m using).

Hosting options

Because Jekyll generates a static site, it can be hosted almost anywhere for nothing or next to nothing in costs. The primary question to consider is your choice of domain name. If you wish to host your site from university domain name, just copy the contents of _site into your public_html or similarly named directory on the web server.

While having a university affiliated domain can appear more official and help with discoverability, buying your own domain name as your permanent Internet home may be a more reliable long-term solution. Once you’ve purchased a domain name from a provider, a static site can be hosted at your own domain name for next to nothing on Amazon S3, or for literally nothing through Github.

[1]: Details on installing and running ruby gems for Jekyll will vary by operating system, so I won’t bother with more details here. Some further background for linux based platforms at least appears in my site’s info file.

Read more



Notes

27 Dec 2012

Nonparametric Bayes

Continuing sensitivity analysis

The commit log to sensitivity.md and earlier to myer-exploration.md (now depricated) capture the summary figures for replicate runs of the observation data, with commits corresponding to various parameter configurations, etc. Here is a nice collection of replicates from sensitivity.md

  • Harvest during observations Tweaked calculation of observation data given the harvest regime under which observations were taken (simulation was not implementing all harvests).

  • Also adjusted norm used in GP (the parametric use log-normal densities in calculating the transition function on the untransformed data. The GP, also on untransformed data, uses normal noise as per its model).

  • Relaxed priors, shows no impact on GP performance.

Non-stationary dynamics

  • Ran the Allen et al. Allee model under conditions for non-stationary stable states (e.g. period 2, period 4) etc, with and without substantial process noise (to confirm the cyclical pattern). Shows good performance of GP against terrible perfomance of alternative parametric approaches. Closes issue #16
plot of chunk unnamed-chunk-2
plot of chunk unnamed-chunk-2
plot of chunk unnamed-chunk-4
plot of chunk unnamed-chunk-4

Commit log links to some additional examples:

  • another allee example altering MLE initial condition to attempt better likelihood estimates. GP still outperforms. 04:40 pm 2012/12/27
  • in oscillatory regime with non-negligible noise. again GP performs nearly optimally while MLE methods suffer greatly 04:27 pm 2012/12/27
  • Example of Ricker-Allee in oscillation regime, shows non-trivial dynamics, GP does very well (closes #22) 04:17 pm 2012/12/27
  • work with non-named arguments in MLE, consistent ordering for plot legends 04:05 pm 2012/12/27

Other projects

  • Upgraded gems, had to update notebook’s liquid code metadata to handle dates in string format on pages vs being date objects in posts. de4d4cc

  • Setting up multiple_uncertainty as a separate repository (branched from pdg-control).

Read more



Notes

23 Dec 2012

  • added generation of observational data under varying harvest conditions (issue #19)

  • added MLE fit based on the data-generating model (issue #20)

  • GP plot with and without nugget variance (issue #17)

  • Run longer simulations under policy such that sustainable profits clearly beat out collapsing the fishery (part of issue #22)

Read more



Progress Summary

21 Dec 2012

Progress mid-October through mid-December

80% Time: nonparametric-bayes

The bulk of my time has been spent becoming familiar with the literature on Gaussian Processes and their numerical implementation. I have written my own Gaussian process code from scratch to convince myself of my understanding of the methodology, and explored some of the related numerical issues addressing computational speed and stability.

I then developed an algorithm for applying the Gaussian process inference of the state dynamics to stochastic dynamic programming methods for determining an optimal harvesting policy. I have compared the performance of the Gaussian-process inferred model to the optimal solution (given the exact underlying model) and to solutions based on estimated parametric models (both matching and not matching the underlying structure). This analysis shows the benefit of the nonparametric approach in better accounting for structural uncertainty in the underlying dynamics.

Steve and I have just started to look into the potential for state shifts and warning signals there-of in the ENSO / Pacific decadal oscillation. I obtained 2 appropriate datasets and have run a very preliminary analysis.

20% Time: Independent projects

  1. pdg-control Working group paper. Updated analysis for Figures 3 and 4. Updated draft. Conference call for feedback from working group participants. (Current draft)
  2. Wrote comment piece on Early Warning Signals for Nature.
  3. Working on EWS review paper for Theoretical Ecology with Noam Ross. (Current draft)
  4. Working on analysis optimal policies under multiple uncertainties (with Jim Sanchirico, Mike Springborn). #multiple-uncertainty tag in notebook; currently analyses are part of pdg-control repository.
  5. Finished and submitted NSF post-doc application

Other academic activities

  • Wrote requested reviews for Conservation Letters, Ecology Letters, Proceedings of the Royal Society B. Declined to review for Systematic Biology.
  • Presented my exit seminar in the UC Davis Center for Population Biology Colloquia series.
  • Interviewed for the Nature column Turning Point.
  • Invited and attended the PLoS alt-metrics workshop and hackathon
  • Ran two sessions on version management, dynamic/reproducible papers, and building academic websites for the Mangel group.
  • Presented scaling from individual to population level models through the van Kampen expansion in Steve’s Applied Math Club
  • Participated regularly in applied math club meetings.
  • Established communication with FishBase team, arranged a coalition of developers interested in API access.
  • ongoing development in rOpenSci project

Goals for January - March

80% Time

Writeup of the Nonparametric Bayesian approach to management paper.

Initial goal for manuscript is proof of concept piece with the aim of demonstrating the value of a nonparametric approach to ecological decision-making. Alternatively a more technical piece could focus on the details of using Gaussian processes models for stochastic dynamic programming.

  • Outline of manuscript: Dec 31st
  • Decide on manuscript figures to include
  • Complete draft
  • Appendices
  • Solocit friendly revisions
  • Submit (by March 31?)

Next steps and Additional projects

  • The multi-species context is probably the next goal. Depending on timing, Steve may take the lead on that in these three months; I aim to be working in this in March. The step after will be the active/reinforcement learning/approximate SDP approaches. I aim to start into the background reading for that material but real progress on that front will happen after March.

  • Steve and I will likely continue to do something with the ENSO / PDO analysis as well. (Currently waiting on Steve’s pending queries to domain experts for further discussion)

  • I will also need to determine my summer conference schedule during this interval, and submit abstracts, etc. I have committed to the symposium to which I was invited at SIAM in San Deigo, July 8 - 12, and plan on attending ESA.

20% Time goals

  • Finish and submit EWS review paper: end of January
  • Finish and submit pdg-control policy costs paper (see current issues)
  • Finish the methods section for the Labrids paper
  • Finish my pending review for Environmental Modeling and Assessment.
  • Skype with pdg-control Working group, basic follow-up tasks.
  • Some next steps on multiple uncertainty.

Additional academic goals

  • Present in mega-group meeting: Feburary 20th
  • Join the phylotastic hackathon remotely(?)

Read more



Results Comparing Gp To Parametric

20 Dec 2012

Comparison of the Gaussian process inference to the true optimum and a parametric estimate.

Comparison across 100 simulations under the policies inferred from each approach show the nearly optimal performance of the GP and the tragic crashes introduced by the parametric management.

Sensitivity analysis

Working through an exploratory sensitivity analysis to see GP performance over different parameters.

Distribution of yield over replicates shows the parametric model performing rather poorly, while most of the GP replicates perform nearly optimally.

larger growth-rate parameters:

from the commit log

next goals/issues

Read more



Exploring Gp Model Space

19 Dec 2012

Trying to think about a more systematic way to go about varying the parameters: the underlying parametric model has 3 parameters for the stock-recruitment curve’s deterministic skeleton, plus growth noise. (My first exploratory phase has been just to try different things. See my various tweaks in the history log Clearly time to be more systematic about both running and visualizing the various cases.)

Should I just choose a handful of parameter combinations to test? (Trying to think of a way to do this that is easy to summarize – at least I can summarize expected profit under each set). Presumably, for each set of these parameters, I’d want a few (many?) stochastic realizations of the calibration/training data.

Would it be worth digging up some real-world data-sets and base the selection of underlying model parameters on them?

Then there’s a variety of nuisance parameters: grid size, discount rate, price of fish (non-dimensionalization eliminates that one I guess), cost to fishing (and whether the cost is on effort or harvest, whether linear or quadratic, etc), harvest grid size / possible constraints on maximum or minimum allowable levels for the control; length of the calibration period (and related dynamics if we use any of the variable fishing effort models you showed me today).

Additionally there’s the MCMC-related nuisance parameters – parameters for the priors, possibly hyperpriors, and the MCMC convergence analysis (selecting burn-in period – currently 2000 steps out of 16000, etc) . Also the distributional shapes for the priors, and perhaps more meaningfully, the GP covariance function (using Gaussian for simplicity, but might want to look at Matern, and the various linear + Gaussian covariances).

New and progressing issues

from the commit log today

Misc: ropensci

Read more



Random_ews_example

17 Dec 2012

ENSO EWS

Let’s just see what happens with the MEI data for PDO:

dat <- read.table("https://www.carlboettiger.info/assets/data/mei.csv", header=TRUE)

For the moment let’s ignore annual structure and just collapse this into timeseries sampled bimonthly.

require(reshape2)
dt <- melt(dat, id="YEAR")
X <- dt$value
Z <- X[!is.na(X)]
Z <- data.frame(1:(length(Z)-1), Z[1:(length(Z)-1)])
png("mei.png")
require(earlywarnings)
a <- generic_ews(Z, detrending="gaussian")

How about the data from MacDonald (2005)

original data link

dat <- read.table("https://www.carlboettiger.info/assets/data/pdo-macdonald2005.csv", header=TRUE)
png("macdonald2005.png")
require(earlywarnings)
a <- generic_ews(dat, detrending="gaussian")
dev.off()

Read more



Nonparametric Bayes Comparisons

15 Dec 2012

Notes on structure of examples

Externalized code so that different example scripts call identical commands for fitting and plotting to avoid duplication of code. Example scripts tend to be pretty text poor at the moment, would probably benefit by better descriptions in the markdown. May still involve redundant and potentially out of date descriptions though.

Currently, externalized code for GP comparison experiments is in gaussian-process-control.R, and is called by may79-example.Rmd, myer-example.Rmd, and reed-example.Rmd.

Also fixed knitcitations to at least include citations in markdown files.

Performance of GP compared to Ricker on an underlying Allee model

Using stationary data only

Inferred GP
Inferred GP
plot of policy functions
plot of policy functions
fish stock dynamics
fish stock dynamics
harvest dynamics
harvest dynamics

(see myer-example.md for code and more graphs & details)

Using more data

Inferred GP
Inferred GP
plot of policy functions
plot of policy functions
plot of sim-fish
plot of sim-fish
plot of sim-harvest
plot of sim-harvest

(from earlier myer-example.md)

Comparing GP to estimated Ricker on BH dynamics:

GP performs well, but Ricker performs adequately as well: reed-example.md (shows an example with stationary data).

plot of chunk gp-plot
plot of chunk gp-plot
plot of chunk policy_plot
plot of chunk policy_plot
plot of chunk sim-fish
plot of chunk sim-fish
plot of chunk sim-harvest
plot of chunk sim-harvest

Other advances

Approaches shown above address a variety of issues as well:

  • Compares to a Ricker estimated by MLE (#8)
  • Considers fewer non-stationary observations, but include 0,0 as observation (#13)
  • Includes scrap value (#10), though this does not guarentee no fishing under non-persistence estimation

A few next steps:

  • measurement error should be introduced in the simulations (#11)
  • Evaluate GP performance under larger noise conditions. (#14)

See issues log for further issues to explore and more details on closed issues. See commit log for full details, this entry summarizes progress over the latter half of the week.

Read more



Nonparametric Bayes Notes

11 Dec 2012

  • Code gp_transition_matrix for generic multi-dimensional case

Understanding Gaussian Process performance

If the estimated recruitment dynamics correspond to population dynamics that are non-persistent (might call this non self-sustaining, but in a rather stricter sense than when Reed (1979) introduced that term), and if no reward is offered at the terminal time point for a standing stock (zero scrap value), the GP dictates the rather counter-intuitive practice of simply removing the stock entirely.

Exploring this by comparing evolution of the probability density for the population size under the transition function. Consider this example from a May1979 model (full run in may1979-example.md): The Gaussian process infers a rather pessemistic evolution of the probabilty density (grey distribution becomes black distribution when unharvested, 20 years (OptTime)):

GP transition function

plot of chunk gp-F-sim
plot of chunk gp-F-sim

Whereas the actual transition function moves the stock to a tight window around the high carrying capacity:

true transition function

plot of chunk par-F-sim
plot of chunk par-F-sim

Often this results in a policy function that harvests all the fish, since they won’t persist. Exploring approaches to avoid such solutions, such as adding a reward for leaving some standing stock at the boundary time (issue #10).

Multi-species examples (issue #7)

multidimensional.md

Fragility of parametric rigidity examples

  • infer under BH and simulate under allee
  • Infer ricker, simulate under BH
  • Other examples?

MCMC

Examples of controlling priors, resulting posteriors. See yesterday’s notes for details

additional R software support

Have been focusing recently on the MCMC implementation for treed Gaussian Processes, provided in the tgp package.

Lots of various implementations of Gaussian Proccesses in R in geospatial stats packages (e.g. Kriging implementations) including some the offer fully heirachical Bayesian approaches with a variety of twists:

  • psgp Projected Spatial Gaussian Process (psgp) methods, Implements projected sparse Gaussian process kriging for the intamap package
  • gstat
  • geoR
  • spBayes spBayes fits univariate and multivariate spatial models with Markov chain Monte Carlo (MCMC).
  • ramps Bayesian geostatistical modeling of Gaussian processes using a reparameterized and marginalized posterior sampling (RAMPS) algorithm designed to lower autocorrelation in MCMC samples. Package performance is tuned for large spatial datasets.

From the commit log…

Read more