[open-science] toward a low-overhead fastpath for open-science "publishing"

Tom Roche Tom_Roche at pobox.com
Fri Aug 3 21:47:03 UTC 2012

Tom Roche Fri, Aug 3, 2012 at 1:50 AM
>> let's produce something like a fastpath [for] an open-science
>> [publication-oriented] process suitable for use by the
>> very-early-career or #scholarlypoor aspiring scientist[, where
>> "fastpath"] refers to a document for a demo. (Or the script for a
>> demo video.) [It] doesn't need to be, and usually isn't, the
>> "absolute best way" to do something (which usually varies by
>> practitioner): [it just needs to be so relatively] drop-dead easy
>> [that] "even Marketing can run it."

Or that can one record it and put it on <your internet video site here/>
and be reasonably confident that any given member of the target audience
can run it successfully.

Peter Murray-Rust Fri, 3 Aug 2012 08:43:21 +0100
> It isn't drop-dead easy to create!

... as noted in original

>> [A fastpath should] therefore usually designed before, and developed
>> with, the code it documents

But, as I should have noted in the original (but added above), a
fastpath just needs to be *relatively* drop-dead easy. I.e., for a
ginormous task like a scientific project, a fastpath need only be more
deterministic and much, *much* faster. And it can be assembled @
runtime from previously-prepared components:

> [A fastpath would probably be] an awful lot and it will take a lot
> of coordination as well as implementation. Can it be done in stages?

Yes. What follows is

* just one way to do this. YMMV. Alternatives welcome.

* based on my current service/tool choices, which may be waay
  suboptimal, but are known to work ... somewhat :-) See problems
  noted in original.

* based largely on my experience as a corporate coder (through late
  naughties), which worked then (but goes stale fast in IT-time):

  My previous gig was developing IDEs for other corporate developers
  (very "meta") working mostly with java and XML, but also legacies like
  SQL and COBOL. (aside: COBOL is not dead--there's GLOCs of it in
  production, everywhere--it's undead :-) So we had to simulate the
  automation of relatively-specialized tasks for folks working in
  (code-wise) quite built-up but very closed environments. (Notably,
  FIRE: that's "where the money is," but those folks *absolutely* don't
  want their code made public.)

  Salesfolk (prettier than coders :-) would demo our wizzy tool du jour
  (e.g., a drag-n-drop diagram editor allowing one to represent
  architectures at a high level and interact with them via wizards
  (i.e., sequences of dialogs) for generating/modifying/deploying code)
  using previously-written fake-but-realistic-looking (I'll call that
  "PWFBRL") components. We used these canned components because actually
  writing them at "demo-time" would have been slow and boring and not
  what the target audience was interested in. So a demo-er would
  therefore spend a fair amount of time saying/doing things like
  (just OTTOMH):

  "So here's your legacy for creating a global-economy-destroying
  derivative contract [click on diagram node opens PWFBRL COBOL in an
  editor]. As you can see [scrolling], it calls foo, bar, and baz
  [clicks in code navigator opening other PWFBRL sources]. And here
  [more navigator clicks] are your triggers in the customer database
  [scrolling more editors showing PWFBRL SQL, putting more nodes on the
  diagram]. And here's your website for your vampire-squid clients [more
  clicks, scrolling, nodes, PWFBRL java, HTML, XML]. Now we're gonna
  invert control [a major naughties buzzphrase] with callbacks in the
  [your framework here--java is all about frameworks :-] façade [and
  about design patterns :-] [lots more clicks, drags, some typing]. Now
  let's recompile [click], and run this sucker [open browser, type a
  little]--et voilà!" [applause]

ISTM CKAN could do likewise. Here's an example (for convenience
only--nothing set in stone, noting caveats from first post in this
thread, and I may be missing something(s)):

1 Know/determine your target audience. (Always job #1.)

2 Imagine (better: copy/modify) a publication-oriented science project
  that is

* sufficiently generic to be understandable by the target audience

* sufficiently interesting to appeal to the target audience

Consider skipping ahead to step=7 (usecase generation), or just do
that in parallel with the following steps (3..6):

3 Project goal: formally-structured content (FSC):

3.1 Imagine (better: copy/modify) one or more FSC instances that the
    project owner(s) would want to generate (e.g., an article, poster,

3.2 Choose FSC generation (FSCG) tool/service(s):

    Probably your major design decision. You will be criticized--get
    over it !-) You want something that outputs pretty FSCs, for which
    inputs are easy/quick to author, that is easy to setup, that
    operates robustly, has good support/community, and <your criteria

    Test your choice: create simple input(s), feed to your FSCG,
    inspect output, loop until you're satisfied.

4 Project-oriented content (POC):

* Data: create/steal some project-relevant data, and decide how to
  manage it (e.g., with thedatahub).

* Code: create/steal some code that does something with the data, and
  decide how to manage it (e.g., with git+github).

* References: fake/steal some references about the project domain, and
  store them appropriately.

5 Get meta: "eat your own dogfood" by managing *this project* (the
  creation of a fastpath) using the tools/services you will be demoing
  in the fastpath. (Or not :-)

6 Create top-level node: see strategy item 1 in


6.1 Create something publicly-available (e.g., a wiki homepage) from
    which one should be able to discover project status, intent, and
    content. This is the project TLN.

6.2 Populate the TLN with POC (e.g., plans, status, visualizations) to
    scroll in the demo. (If a wiki, POC can be in other pages in the
    same wiki--i.e., implement the TLN as a site.)

6.3 Since users will be spending much time in the TLN, satisfy
    yourself regarding the ease of editing, navigation, and searching
    the TLN itself, as well as the content it contains/indexes.

6.4 Add your FSC input and output to the TLN. Repeat usability testing.

7 Create demo-worthy usecases. These might include any of the
  following, and new usecases can be added as you have resource and

* adding a new phase/activity to the TLN/site about incorporating new
  data (which needs added to the datastore, etc)

* adding new POC to the TLN. E.g., you need a new visualization, so
  you create or edit new code (which you manage), you run the code, 
  you add the visualization to the project site (and perhaps the FSC).

* editing/regenerating your FSC (adding new code, figures, etc).

N Write something out. Demo on yourself. Demo to a friendly audience.
  Put a rough draft on your blog. "Go live" to a "real" audience.
  Record a screencast, release it to the world. Add new usecases,
  refactor the whole thing. Choose new tools/services. Smile when
  folks fork it, change everything, and proclaim your inferiority.
  Change the world and die happy :-)

FWIW, Tom Roche <Tom_Roche at pobox.com>

More information about the open-science mailing list