[wdmmg-dev] Proposed model overhaul

Tue Oct 4 13:06:14 UTC 2011

I'll gladly prepare this when I'm back online but I think these are
relatively distinct issues:

a) Do we want to have implicit processing options in the model or we
want to skip these and later add explicit processing (e.g. Refine) or
is this out of scope for openspending (my vote atm).
c) How should the next version of the model look more generally
d) How do we represent that either in a rdmbs or in mongo

- Fr.

On Tue, Oct 4, 2011 at 2:55 PM, Martin Keegan <martin.keegan at okfn.org> wrote:
> Can we link in the "before" and "after" examples?
>
> On Tue, Oct 4, 2011 at 11:51 AM, Friedrich Lindenberg
> <friedrich.lindenberg at okfn.org> wrote:
>> Hi Martin,
>>
>> On Tue, Oct 4, 2011 at 12:36 PM, Martin Keegan <martin.keegan at okfn.org> wrote:
>>>>> a payment in respect of which we want to record a value in more than
>>>>> one currency?
>>>>
>>>> This, otoh, feels like its two ways of looking at the same payment
>>>> (i.e. a split between currencies would duplicate the amount in the
>>>> DB). One way to type for currency would be to have a flag on the
>>>> measure to specify its currency on a per-measure basis, rather than
>>>> per-dataset. I actually like this a lot, since it emphasises the fact
>>>> that we deal with spending. We now can decide on how specific we want
>>>> to be there (i.e. "USD" or "2010 USD" or "31.12.2009 USD").
>>>
>>> So, the main case I am thinking of is:
>>>
>>> you have a NGO which spends money in several currencies, and someone
>>> has dumped their raw bank statements onto you (this is not
>>> unrealistic), and these are in more than one currency (this is very
>>> realistic indeed)
>>
>> I think we are talking about two separate issues here: one is a case
>> of having the right data (e.g. spending in multiple currencies) and
>> wanting to represent it - which is clearly within scope of the model
>> language.
>>
>> The other issue is having incomplete data and wanting to infer the
>> remaining data from what is there (e.g. filling up a measure with
>> currency conversion). This, IMO, is very much outside the scope of the
>> model language.
>>
>> If we are to include it, I propose we have a special section in the
>> model (e.g. "pre-load-transform") and allow users to specify
>> JSON-wrapped snippets of JavaScript that can will be executed by etl
>> before import. This could then have a library function to perform
>> currency conversion. The task specification language used could be
>> that of Google Refine (I still want someone to re-implement the Refine
>> service in Python for us and ScraperWiki to include).
>>
>> I just think that including more and more quasi-procedural options
>> into the model format will lead us to doom (or re-building Windows
>> NT).
>>
>> - Friedrich
>>
>> For reference, a Refine snippet:
>>
>> [
>>  {
>>    "op": "core/column-addition",
>>    "description": "Create column name_munged at index 2 based on
>> column name using expression grel:value.strip().toLowercase()",
>>    "engineConfig": {
>>      "facets": [],
>>      "mode": "row-based"
>>    },
>>    "newColumnName": "name_munged",
>>    "columnInsertIndex": 2,
>>    "baseColumnName": "name",
>>    "expression": "grel:value.strip().toLowercase()",
>>    "onError": "set-to-blank"
>>  }
>> ]
>>
>

-- 
Open Knowledge Foundation
Promoting Open Knowledge in a Digital Age
http://www.okfn.org/ - http://blog.okfn.org/

http://twitter.com/pudo
http://pudo.org