[wdmmg-discuss] Next steps for backend work (Phase II)

Tue May 25 16:51:46 UTC 2010

In my email [1] earlier today I outlined some priorities for Phase II
and on. Based on that, from the backend perspective, biggest tasks (in
*very* rough order of priority) are:

1. Support for editing and adding "core data".
  * Change to a more flexible data model (RDF-based)? -- see below
  * Requires revisioning and write interface

2. Classification work (more below)

3. Commenting and annotation
  * May be useful to distinguish these with annotation being
structured (e.g. flag this, tag this, etc)

4. New data: pog codes, COINS (?), ...

I've expanded on a couple of these times further below. Does this look
about right to people?

Regards,

Rufus

[1]: http://lists.okfn.org/pipermail/wdmmg-discuss/2010-May/000226.html

## 1. More flexible data model

Our current RDBMS-backed domain model isn't great. In particular, we
have had to "hack" in key-value type support. Going forward there is a
clear need to address this issue, especially before we add in editing
support. From the start on this project we have thought about using a
triple store as it would deliver:

  * Good, flexible data structure
  * Easier integration with external RDF data sources
    * we are using a lot of data from government and elsewhere, much
of which is increasingly available as RDF
  * Reasonable integration with existing toolsets

Recent work by Will Waites on ORDF library and bibliographica
(<http://bibliographica.org/docs/ordf/>) has shown the feasibility of
using triple stores here. I therefore think there is a good enough
basis for trying out a spike solution asap to test feasibility of this
approach here.

## 2. Accounts and Classifiers

  * Recap of basic Account-Transaction-Posting (ATP) model
  * Importance of associating accounts to something real
  * However, Accounts are often used for 2 purposes: actual
transactions, and pseudo-classifiers
  * For example, in your personal accounts you might create an account
called "air travel". This does not represent a "real" entity or
account but is really a classifier.
  * While this is a common usage in our case it causes issues two
usages are orthogonal and while "real" accounts are anchored to
something "real" classifiers are more like tags and may only converge
to shared usage over time (if at all).

Put succintly: "classifiers are messy, while ATP should not be"

We therefore have a need to explicitly augment our domain model with

  * Classifiers -- e.g. a Function (such as Health, Social
Protection), or a Region (such as England North West).
    * Note that we may also want to add normal attributes to an
account -- for example to say this account is for an entity based in
this area but this is different from a classifier.
  * Classifications: association between an Account and a Classifier
with a percentage or amount. Means this amount of the money in this
Account is classified with this classifier
    * may also also associatoin to Transaction or even Posting