[wdmmg-dev] Removing aggregated entries

Sat Jun 25 17:13:54 UTC 2011

On Sat, Jun 25, 2011 at 6:58 PM, Rufus Pollock <rufus.pollock at okfn.org> wrote:
> On 25 June 2011 17:48, Friedrich Lindenberg
> <friedrich.lindenberg at okfn.org> wrote:
>> Hi all,
>>
>> I've just pushed feature-no-aggreagte-entries, a branch without
>> aggregated entries. Creating this involved the following changes:
>>
>>  * Re-implementing the views system to work against cubes (faster now)
>>  * Re-implementing the old aggregator API to work against cubes (fixes #119)
>>  * Remove most of the aggregation code
>>  * Remove all references to is_aggregate etc. from code base,
>> including entry-page entry browser
>>
>> While this greatly simplifies OpenSpending, I'm sure Rufus will
>> disagree. Here are my arguments for merging this in:
>
> My main objection would be the 'fait accompli' aspect :-) That said
> given that "we" implemented aggregates in cubes rather than in the
> main system (why did that happen ...?) it does make sense to get rid
> of doing things twice.

Yeah, it started off innocently (can I switch views to cubes?), got
worse (and the old API?) and then mutated (ok, lets kill it). No evil
plan here, just saturday afternoon. Backported to branch only half way
through.

>>  * Nobody wants to use aggregate entries: they are terribly
>> distracting in navigation and viewing them is weird.
>
> But you view aggregates all the time - all our views are off
> aggregates .... The real question is whether you view them like normal
> 'entries'.

You're right of course. Should have said: nobody wants to see
aggregates we made up, people much rather see the ones government made
up. They're both strange but with us you don't get to blame elected
officials ;-)

>>  * They won't build compare-o-tron for us: for this we need something
>> with clear labels, i.e. a view that combines entities, classifiers and
>> entries and assigns each a unique value. Aggregate entries don't do
>> this.
>
> Certainly agree that we need to think through comparotron requirements
> more but you're going to end up needing a search index that looks like
> aggregates.

It will, but I think we haven't yet begun to understand how hard
(domain specific) making this set of objects will be. Or at least I
haven't. This will be a good topic for the OS hackday, though :-)

>>  * Given that we seem to be having problems with the size of the
>> "entries" collection, we might not want to put aggregates in there,
>> where their retrieval is particularly time sensitive.
>
> Given that we think there will be as many aggregates as entries
> (possibly more) think we will have issues anyway. My main suggestion
> (that I have been thinking about for a while) is to move to a system
> in which each datasets gets its own set of collection -- i.e. we move
> to real datamart model. At the moment I think there is zero benefit in
> having entries from different datasets in one big table ...

You might be right after all. In a way, this is what the new
aggregation system already does:

> show collections;
changeobject
changeset
classifier
cubes.eu.default
cubes.eu.view_countries_country_dataset_flow_year
cubes.eu.view_default_article_chapter_flow_year
cubes.eu.view_default_article_flow_item_year
cubes.eu.view_default_chapter_flow_title_year
cubes.eu.view_default_country_flow_title_year
cubes.eu.view_default_dataset_flow_to_year
cubes.eu.view_default_flow_item_subitem_year
cubes.eu.view_default_flow_title_to_year
dataset
dimension
distincts__eu
entity
entry

Over the last week I have again been looking at using some kind of
graph store to reduce the amount of redundancy we need to maintain
(fighting RDBMs till my last breath here), but at least for triple
stores the tooling is just still so bad its not even funny. The only
reason I haven't completely given up here is that the
URI-for-everything model has started to eat my brains.

(Rufus: post Riak argument here :-)

- Friedrich