[wdmmg-dev] types, names and URLs

Martin Keegan martin.keegan at okfn.org
Mon Sep 19 13:20:09 UTC 2011


We currently require that all dimensions in datasets be of one of three
types: entity, classifier or value. It has proven non-trivial to document
and explain these for data wranglers.

Each of the types implies particular properties of the dimension:

1) whether it is the final source or recipient of funds
2) whether it is required to have a taxonomy, or required not to
3) whether it's permitted for certain breakdown/drilldown operations
4) whether it may be represented by multiple fields in the CSV file (and
accordingly what subformat it takes in JSON mappings)
5) whether it makes sense to give totals/means of the data (as opposed to

3) and 5) may be the same underlying concept

These properties are largely orthogonal, but all are conflated in our
current system.

I propose that we abandon the value/classifier/entity distinction, and have
the following properties for dimensions

1) a boolean "final endpoint" flag
2) a taxonomy, which may be null
4) that all dimensions's mappings be in the classifier/entity multicolumn

I think 2) is orthogonal to having taxonomies activate plugins.

Classifiers and entities currently each have a namespace, which is
represented in URLs, and is non-dataset specific. I'm not sure what I want
to see happen to that, and would appreciate suggestions.

