[openspending-dev] Joins across datasets using cubes

Friedrich Lindenberg friedrich at pudo.org
Thu Jan 8 15:06:40 UTC 2015


Hey Nathan,

On Thu, Jan 8, 2015 at 3:48 PM, Nathan Hilbert <nathanhilbert at gmail.com>
wrote:

> Hopefully the first dev meeting today was productive.
>

As they say on Facebook: it's complicated


> Is this archived somewhere?  I'll be attending the second one my evening.
>

Yep, here's a transcript: https://botbot.me/freenode/openspending/


> I was wondering if you have given any thought to joining datasets using
> cubes.
>

Just to be clear: you're talking about linking two datasets via a shared
dimension (e.g. geography, account codes or something like that), rather
than querying to two datasets at the same time (like a SQL UNION), right?


> It seems like it should be straightforward, and I understand how it is
> done against a SQL DB using the model.json; however, there is a lack of
> examples doing the same thing using a provider extension.
>

I think the provider would have to implement the ``dimension`` and
``public_dimensions`` functions, which could map such "conformed
dimensions" (we're all learning Stefan Urbanek's vocabulary at the moment)
onto a shared classification table.

In OpenSpending, I could imagine there also being a set of mapping tables
which would translate from the dataset-native classification scheme (e.g.
the German Funktionenplan) to a globally accepted standard classification
(e.g. the UN Classification of Functions of Government). If you can do the
linkages before loading, though, that mapping would not be required.


> For my use case, it would be a snowflake schema with geography in the
> middle, but I think we would want to extend this out to include joining on
> arbitrary dimensions.  I've been trying to understand the cube function
> override from the ModelProvider class a bit more as it seems like the
> override for OS is bypassing some of the work that needs to be done for
> joins and mapping.
>

Not really, I'm just passing the table and column names in OpenSpending
over to cubes in a dict. That's pretty much just auto-generating a normal
cubes model from the OS model.


> If you have seen any examples or can point me in a general direction, it
> would be much appreciated.
>

I'm not quite sure. I think the cubes docs aren't explicit about this, but
[1] can either be cube-local or warehouse-global, in which case the cube
would be constructed with the ``dimension_links`` argument, rather than the
``dimensions`` argument used in my current mock implementation.

[1]
http://pythonhosted.org/cubes/reference/model.html#dimension-hierarchy-and-level



> By the way, I can now see why you switched it cubes.  It will cut down
> substantially on the time I thought I had to put into the aggregate
> functionality.
>

Yeah, a lot of kudos to Stefan on this one, he's really done an impressive
job with the library.

Cheers,

- Friedrich
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/openspending-dev/attachments/20150108/2397e42e/attachment.html>


More information about the openspending-dev mailing list