Sunday: a few Treebase R package updates

Working on a few updates to the TreeBASE package.  This flushes out the basic functionality provided by the phylo-ws API now. Needs a bit more testing of the possible queries and some bells and whistles options.  Meanwhile, going to start looking at the metadata side with the OAI-PMH API.  With this I should be able to grab metadata associated with a tree or the tree associated with the metadata.  Should also be able to extend the queries over to other databases such as Dryad.

  • Looking into what might currently exist: apTreeshape (Bortolussi et. al. 2005) had a dbtrees function which has since been removed. Scott Chamberlain has is also exploring this a bit, sounds like a good person to talk to.

  • Fixed many of the XML parsing errors and warnings by moving to RCurl for queries:

    Use

    tt = getURLContent(query, followlocation = TRUE) #instead of con = url(query) tt = readLines(con, warn = FALSE) close(con)

  • This adds RCurl as a dependency. (Following Duncan’s suggestion to keep from polluting the namespace, I’ve moved dependencies into imports instead).

  • implemented the remaining use cases, needs testing of these other query types still.  A few extra tests have been added to the demos.

  • Automatically matches section, rather than specifying as a third argument.

  • Implemented handling for boolean logic, not clear that the API handles these cases correctly? Seems to do OR, but treats AND and NOT as if it was also OR.

  • Joined Treebase-devel@lists.sourceforge.net.  Should probably post about this work and ask about this boolean trouble.

  • Made output a multiPhylo object.

To-Do

  • Should add faculties to return only trees with branch lengths, etc. DONE 2011-05-11

  • Should figure out how to pull metadata from returned matches. Possibly grab the TB id number, and then use a separate query to pull that up. DONE 2011-05-11

  • Consider maximum number of returns on a query. DONE 2011-05-11

  • Hm: Given a id like TB2-S2377 number (study id), why do these both return unrelated trees, and what is a id.tree?

studies <- search_treebase(“2377”, by=“id.study”) tree <- search_treebase(“2377”, by=“id.tree”)

Metadata and OAI-PMH

  • Needs to get study id from the phylows query and then look up the metadata.  See treebase wiki.  (Pretty sparse, maybe I should add something).  DONE 2011-05-11

Example queries:

References