[openspending-dev] Joins across datasets using cubes
friedrich at pudo.org
Thu Jan 8 15:06:40 UTC 2015
On Thu, Jan 8, 2015 at 3:48 PM, Nathan Hilbert <nathanhilbert at gmail.com>
> Hopefully the first dev meeting today was productive.
As they say on Facebook: it's complicated
> Is this archived somewhere? I'll be attending the second one my evening.
Yep, here's a transcript: https://botbot.me/freenode/openspending/
> I was wondering if you have given any thought to joining datasets using
Just to be clear: you're talking about linking two datasets via a shared
dimension (e.g. geography, account codes or something like that), rather
than querying to two datasets at the same time (like a SQL UNION), right?
> It seems like it should be straightforward, and I understand how it is
> done against a SQL DB using the model.json; however, there is a lack of
> examples doing the same thing using a provider extension.
I think the provider would have to implement the ``dimension`` and
``public_dimensions`` functions, which could map such "conformed
dimensions" (we're all learning Stefan Urbanek's vocabulary at the moment)
onto a shared classification table.
In OpenSpending, I could imagine there also being a set of mapping tables
which would translate from the dataset-native classification scheme (e.g.
the German Funktionenplan) to a globally accepted standard classification
(e.g. the UN Classification of Functions of Government). If you can do the
linkages before loading, though, that mapping would not be required.
> For my use case, it would be a snowflake schema with geography in the
> middle, but I think we would want to extend this out to include joining on
> arbitrary dimensions. I've been trying to understand the cube function
> override from the ModelProvider class a bit more as it seems like the
> override for OS is bypassing some of the work that needs to be done for
> joins and mapping.
Not really, I'm just passing the table and column names in OpenSpending
over to cubes in a dict. That's pretty much just auto-generating a normal
cubes model from the OS model.
> If you have seen any examples or can point me in a general direction, it
> would be much appreciated.
I'm not quite sure. I think the cubes docs aren't explicit about this, but
 can either be cube-local or warehouse-global, in which case the cube
would be constructed with the ``dimension_links`` argument, rather than the
``dimensions`` argument used in my current mock implementation.
> By the way, I can now see why you switched it cubes. It will cut down
> substantially on the time I thought I had to put into the aggregate
Yeah, a lot of kudos to Stefan on this one, he's really done an impressive
job with the library.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the openspending-dev