[wdmmg-dev] Removing aggregated entries

Sat Jun 25 16:58:38 UTC 2011

On 25 June 2011 17:48, Friedrich Lindenberg
<friedrich.lindenberg at okfn.org> wrote:
> Hi all,
>
> I've just pushed feature-no-aggreagte-entries, a branch without
> aggregated entries. Creating this involved the following changes:
>
>  * Re-implementing the views system to work against cubes (faster now)
>  * Re-implementing the old aggregator API to work against cubes (fixes #119)
>  * Remove most of the aggregation code
>  * Remove all references to is_aggregate etc. from code base,
> including entry-page entry browser
>
> While this greatly simplifies OpenSpending, I'm sure Rufus will
> disagree. Here are my arguments for merging this in:

My main objection would be the 'fait accompli' aspect :-) That said
given that "we" implemented aggregates in cubes rather than in the
main system (why did that happen ...?) it does make sense to get rid
of doing things twice.

>  * Nobody wants to use aggregate entries: they are terribly
> distracting in navigation and viewing them is weird.

But you view aggregates all the time - all our views are off
aggregates .... The real question is whether you view them like normal
'entries'.

>  * We only ever created a partial set of aggregate entries and thus it
> makes no sense to search them (obvious alternative here is to
> implement full pre-agg, of course, but that in itself solves nothing)

Agreed that partial isn't very useful but that's why I always said we
need full (or at least all the main aggregates we'd want)

>  * They won't build compare-o-tron for us: for this we need something
> with clear labels, i.e. a view that combines entities, classifiers and
> entries and assigns each a unique value. Aggregate entries don't do
> this.

Certainly agree that we need to think through comparotron requirements
more but you're going to end up needing a search index that looks like
aggregates.

>  * Given that we seem to be having problems with the size of the
> "entries" collection, we might not want to put aggregates in there,
> where their retrieval is particularly time sensitive.

Given that we think there will be as many aggregates as entries
(possibly more) think we will have issues anyway. My main suggestion
(that I have been thinking about for a while) is to move to a system
in which each datasets gets its own set of collection -- i.e. we move
to real datamart model. At the moment I think there is zero benefit in
having entries from different datasets in one big table ...

>  * It makes things simpler. One aggregation system is better than two,
> and the newer one works better.
>
> What do people think?

I think we should go ahead because we have already ended up with
aggregates in two places. Going forward let's try and plan out major
architectural stuff clearly in advance and either get reasonable
consensus or delegate decision making clearly to someone or some
group.

Rufus