Data to Knowledge Day II and other notes

Great second day at the Data to Knowledge conference (a few notes below).
Also I have been trying to move a few projects forward during the commute and break. I’ve sent Alan my revisions on the prosecutor’s fallacy manuscript, getting pretty close. Following up on a query from John, ran a simulation showing the impact of having a delay in the timeseries before the slow change towards the bifurcation begins. Certainly weakens power but does not preclude the linear-change model detecting the change, see example Also moving towards scheduling a call for the pdg-control manuscript on stochastic policy costs; some notes below.

Data to Knowledge Conference, Day 2

David A. Bader, algorithm design / HPC perspective


  1. HPC on big data (massive graphs)
  2. Streaming analytics: Today’s challenges aren’t dense linear algebra.
  3. Informational visualization - more data on graph than we have pixels
  4. Hetergeneous systems - a system that can reconfigure memory hierarchy on the fly
  5. energy efficiency (moving data takes high energy as well as time)

This is not the time to get stuck in a single paradigm! Scientific computing != MPI Big data != Hadoop/Map-reduce

External disk -------------------------------------------- Internal memory
Hadoop -- traditional database -- cloud/cluster -- large shared memory
Easy programming -------------- harder
low performance --------------- higher performance
high energy use --------------- low energy use

Latency on memory (random access some address in a big global index) dominates over floating point calculations Cray XMT 512 processor, 128 threads/processor, 64TB global memory, no cache.

Jeff Hawkins: Brain as a streaming processor

learning isn’t about strengthening weights on synapses – they form and vanish all the time. connections are binary, with scalar permanence.

The two-Layer learning system (classification problem)

  • Layer 0 is the original problem:
  • receive labeled and unlabeled data
  • train a classifier on labeled example, predict for others.
  • Layer 1 meta-classifier which tries to learn over which data layer 0 does well (using binary classification on layer 0 as true or false)

Policy Costs

Framing the question: Need to discuss these issues, then get a someone to be point-person on each?

  • Policy costs – Why this is important, what’s already known.
  • Typical ways these concerns are addressed (i.e. quadratic costs on control), similarity & differences.
  • Discussion of policy costs as “Regularization”
  • Why stochastic?
  • Background: Reed model.

Results so far: (discuss these, decide what needs more study)

  • Introduce & characterize the impact of different norms: L1, L2, asymmetric, fixed costs
  • Comparison to Reed
  • The double-pay of policy costs: Paying for the transactions, difference from optimal policy
  • Comparisons between these norms apples-to-apples.

Discussion/Conclusion – what do we highlight?

  • Must first answer - which journal/audience?


Modified css to use twitter-bootstrap striped tables as table default, since markdown won’t add a class on tables automatically. Hopefully doesn’t cause trouble with other table layouts.

Note that xtable can handle this by toggling options"

html.table.attributes=getOption("xtable.html.table.attributes", "class=table-striped"),

so setting html class of tables should work fine in knitr. On the flip side, sounds like pandoc would do better with ascii tables as input than with html tables?