[wdmmg-discuss] Interfacing the presentation layer to the server
Alistair Turnbull
apt1002 at goose.minworks.co.uk
Tue Mar 30 16:51:46 UTC 2010
I've just spoken with Dave Boyce who will be writing the snazzy Flash
presentation layer for our site, to try to agree what kind of API it will
use to request data from the server. This post is a write-up of the
conversation, before I forget it.
Scope
-----
In the short term, we are going to concentrate on the CRA data set only.
However, we do think it is worth parameterising the things which are
likely to be different for other data sets in the future. The main
examples of future data sets are the local spending reviews and COINS
(hopefully!).
The main thing that needs to be parameterised (and which was not
parameterised in the prototype) is the list of axes along which the data
can be broken down. For the CRA, the axes are: Dept, Region, POG, COFOG.
For other data sets, the axes are likely to be different.
More generally, it will no longer do to compile the data into the Flash
program. The Flash program will instead need to request data from the
server.
The time axis
-------------
We agreed that the time axis is a special case. It differs from the other
axes in several ways:
- The time axis is dense. By this I mean that if you find a (Dept,
Region, POG, COFOG, time) combination for which the amount of spending is
non-zero, and then change the time only, the new combination will probably
also have a non-zero amount of spending. This is often not true of the
other axes.
- The time axis is ordered.
- The time axis is roughly the same for all data sets.
- Users will generally not want to sum along the time axis.
Dave also requested that we treat the values on the time axis as opaque
strings, not as dates. In the data available to us, the column headings
are of the form "2008-09", for example. The data in that column is a total
for the period from 1st April 2008 to 31st March 2009, I think. Until now
I have been loading all the spending in that column into the database with
a timestamp of 1st April 2008, but Dave is quite right to point out that I
am just inventing precision that doesn't really exist.
Division of labour: aggregation
-------------------------------
We discussed whether aggregation should be done on the server or the
client. (By "client" I mean the Flash program that Dave is writing,
which will run in the user's web browser).
The advantage of doing aggregation on the server is that it reduces the
amount of data that needs to be sent to the client. To take Dave's
example, the prototype only ever shows about 100 numbers at the same time.
It is probably never worth it downloading all 150,000 numbers in the
database.
The advantage of doing aggregation on the client is that CPU is plentiful.
It doesn't much matter if the client has to pause to do a large
calculation; it will not affect anybody else. However, if the server has
to pause then that limits the number of users, and it becomes a
significant expense.
We might be able to get the best of both worlds, by doing the calculations
on the server and cacheing the results. It might have to be quite a large
cache, because there are many subtly different requests that the client
can make. The hope is that a small subset of the possible requests will
account for 95% of the traffic, so that the server will only have to stop
and think rarely.
We decided to try using a cache at first. If that works, it will have been
the right decision. If it doesn't work, we will give up and instead do the
calculations on the client. We calculated that the client will only have
to download about 2Mb of data to get a copy of the entire data set, so
this fall-back option is certainly feasible.
Division of labour: search
--------------------------
Dave wants to build a search facility into the client. This is fine by me.
He will probably not need to download any extra data to make it work,
beyond what he would have needed anyway for the presentation.
There will probably also be a search facility for those browsing the store
in HTML form: the minority of users who click on the "Edit" button.
However, this is my problem, not Dave's.
New requirements: aggregator
----------------------------
My first stab at the API for the aggregator seems to be roughly right in
some respects, and completely wrong in others. I need to make the
following changes:
- It must return time series, not just totals for the period. As
mentioned earlier, we want to treat the time axis as dense, while keeping
the other axes sparse.
- It must support filtering by key/value pairs. For example, if the user
drills down to a particular COFOG code and region, then the server must be
able to return data for just that COFOG and region.
With these changes, the JSON returned (server to client) by the aggregator
will probably turn out something like this:
{
"metadata": {
"slice": "cra",
"filters": {"cofog": "2.4.1", "region": "London"},
"axes": ["dept", "cofog", "pog", "region"],
"times": ["2003-04", "2004-05", "2005-06", etc... ]
},
"results": [
[
["Dept032", "2.4.1", "S71000502", "London"],
[56.1, 50.1, 51.5, etc... ]
], [
["Dept032", "2.4.1", "S71000503", "London"],
[19.8, 21.5, 20.0, etc...]
], etc...
]
}
(The "slice", "filters" and "axes" fields in the metadata are only there
to confirm that the server has understood the client's request).
This JSON format is reasonably dense (ignoring white space). If you
downloaded the entire data set in this form, broken down as much as
possible, it would only come to about 2Mb.
I need to work out what the request URL should look like for the above
JSON response. Probably not far off my first stab, but the filtering needs
some thought. I will try to integrate the filtering with the ugly
"spender_key", "spender_value" mechanism, in the hope of coming up with
something that is tidier overall.
New requirements: other requests
--------------------------------
We also identified two other things that the client will need to ask of
the server:
- Slice metadata. This includes what axes are available, for example.
- Axis metadata. This includes an index explaining what all the codes
mean, for example.
We haven't yet gone into any detail yet.
Best wishes,
Alistair
More information about the openspending
mailing list