[openspending-dev] Micro-services: OpenSpending's future architecture

Mon Dec 22 14:43:31 UTC 2014

Hi all,

Warning, long and quite theoretical, but still important to discuss. For
the short version please see the updated OpenSpending Enhancement
Proposal #1:
https://github.com/openspending/osep/blob/gh-pages/01-approach-and-architecture-of-openspending.md

Conway's Law says: "A system reflects the organizational structure that
built it".

In its essence this is about how communications between team members
affects the architecture of the software they're building. Conway's Law
is very important for managers who are organizing the team. It tells
them that they organize the team so that its communication lines reflect
the software. If the system architecture changes, the team also needs to
change.

The problem with these recommendations is that they don't reflect open
source development. In open source development projects, there is no
manager responsible for putting the team together or re-organizing the
team members. In open source software development the team is organic,
self-selected. People join and help out (in the capacity they can) when
and because they're interested.

So from an open source software development point of view we have to
turn this around. *The system should reflect the organizational
structure that can build it*. It's my hypothesis or conjecture or
whatever. What I'm trying to say is that it's the other way around in
open source software development: the system needs to be designed to
accommodate for the community that uses and will therefore build it.

OpenSpending's community is complex. We all approach OpenSpending from a
different perspective and different use cases. We all have specialised
needs that have to do with how we interact with OpenSpending and what we
expect to get from it. The current system however reflects an
organizational structure of a close group of team members, who
communicate internally (which is kind of how OpenSpending came into
being). But that's not what OpenSpending should be. We're an open source
software project. As such we create software that should be able to
service many stakeholders, who communicate together publicly (scratch
the itch and all that).

So to cut this introduction short and save it from being too academic
and boring. I want to propose a different system architecture for
OpenSpending, one that invites a bigger community to participate and
more aptly reflects the organizational structure we want.

This new architecture is a micro-services solution. Lots of smaller
components that can talk together via defined protocols. A system that
can be extended to scratch itches and be simple enough to allow people
to jump into a small project without having to dive through a mountain
of (directly) unrelated code. So this takes David Parnas' information
hiding to the system level (we're not inventing the wheel here).

This triggers more software engineering theory goodness in me (which you
may or may not interest you as much as me). We've kind of covered
Parnas' Law: "Only what is hidden can be changed without risk". Parnas'
Law is a two-edged sword. For us this would allow us to without much
risk of causing a butterfly effect within the OpenSpending code base but
it also makes it very important for us to have a good think about what
we expose (so we don't change it very often because that's risky). So
the interface is very important to get right but the flow behind it is
something we can iterate fast on.

Another law that touches this architectural change is Lanergan's Law:
"The larger and more decentralized an organization, the more likely it
is that it has reuse potential". It's still worth iterating even though
that's what we developers usually do. The micro-services should be as
general as possible so that they can get re-used. Who knows if they can
get used outside the OpenSpending community and we'll have an even
bigger group of people helping out with maintenance and development.
Again this is kind of a reverse of the law. Let's design with reuse
potential so we can have a large and more decentralized organization.

And lastly in this probably-only-exciting-to-Tryggvi software
engineering theory, a word of advice from DeRemer's Law: "What applies
to small systems does not apply to large ones". Breaking things down in
this way may not end up as being a more manageable system. It will be a
bumpy road because experiences may not necessarily be shareable between
micro-systems or for the overall system, but this architecture may
instead invite a bigger development community, i.e. a bigger team that
can share the burden. Hopefully some with experience in large system
designs and others with experience in smaller systems, so different
experiences and interests are going to be needed.

Alright enough with theory! What are we going to do?

To sum the architectural change up, we want to: *Centralize data and
De-centralize presentation*.

If you want to follow along you can take a look at the images in the
OpenSpending Enhancement Proposal #01:
https://github.com/openspending/osep/blob/gh-pages/01-approach-and-architecture-of-openspending.md

This makes it quite difficult to talk about the OpenSpending platform
because there will be no "central platform" per se, only a central
repository of data, plus some subdomains to expose services. It's
probably therefore better to think about the software itself in terms of
the OpenSpending repos on github.

The overall architecture is that we would split OpenSpending into three
"layers" (in the images marked by mostly OpenSpending stuff, some
OpenSpending and some "others stuff", and "Mostly "others" stuff:

* Input and storage of data
* Information retrieval and analysis
* Presentation and external sites

We propose that we put most of our focus going towards into receiving
and storing raw data. That's the underlying building block. Without the
data we have nothing. So rock-solid input of data, standardized formats
to make it all usefull outside the context a single user wants to use it
in. So the focus here would be in a Budget Data Package importer
(standardized data). And storing the Budget Data Packages in something
like a flat file storage (s3). Hook all of that into some permission
system we devise and validation etc. This does not mean that we can just
ignore everything that' not a Budget Data Package, so we'll need another
importer which for example would map onto a Budget Data Package, at
least to begin with, but imo we should focus on the BDP importer.

Then we would also but a lot of power into the analysis of the data and
making it accessible (but in such a way that it supports various and
distributed presentation modes). However here we would expect others to
also do things which wouldn't be in the OpenSpending github organisation
repos. So the OLAP cube OpenSpending now imports, models and maps
everything into, would happen in this area, but of course in a different
way than what it currently does. We would now base everything off of
standardized data and automatically import into the OLAP cube and
perhaps build standardized aggregation queries and cache them properly.
There could of course be others who want to use something else like
Hadoop or something to analyse the data in a different way and they
could. The raw data we serve (previous layer), is centralized but
anybody can use it in the way they like.

The services we would focus on would be an OLAP cube with standard
aggregations, search (a different implementation probably than what we
currently have) and SQL-like arbitrary queries to provide more
professional access to the data where you could join datasets and things
like that. We would front this with an API so the analysis bit isn't
directly accessible, i.e we won't give anybody direct access to backend
systems (but do it via an API micro-service), just like we wouldn't give
anybody direct access to the data storage.

The presentation layer is where we would put least of the focus, except
only on supporting services/software solutions. We would leave this
layer mostly up to "others" (which would still probably be part of the
OpenSpending community). By that we mean that we wouldn't have many
repos for presentation things (and move those we now have elsewhere)
except for a few very general or specific ones. A general ones would be
templates that people can use to build their own budget visualisation
sites (like Where Does My Money Go?) or plugins like our WordPress
plugins or the CKAN plugins (on the principle of trying to reduce the
information hiding/exposure risk). In this layer we would also have the
OpenSpending.org website but a simpler version of the current one.
Basically just as a frontend for or link to some micro-services,
introduction to the project etc.

In between the reading of data (either raw data or analysis results via
the reading api) we would provide some budget visualisations, mostly via
OpenSpendingJS, but it would be open for others to implement their own.
I put these into the information retrieval and analysis layer because
they wouldn't be able to stand on their own and would be used by the
presentation layer and require special knowledge of budget
visualizations (e.g. adjusting for inflation when comparing across years
etc.) So in a way it's a reading thing and something I expect us to
provide most the core part, but yeah it could also be in the
presentation layer. It doesn't really matter that much where we put it
as long as we all understand what our role as a developer community is
in providing these services.

I think this email has already become too long so I'm going to stop for
now and give you some room to think and contemplate but there are a lot
of decisions we need to think about going forward, if this is something
we agree on:

* Integration layer between components (HTTP/Message queues/Carrier
pidgeons)
* How to build common components that can be re-used by all (or what we
can re-use from others)
* Preferred development language of components (preferred, not one to
rule them all)
* How and where to start? (what components should we start work on)
* Migration of older datasets (existing one in OpenSpending, can we
focus on BDP at all?)
* Design of each component (probably separately by those who want to
work on it)
* Code and communication conventions for the dev community (common
guidelines - we're a group!)
* Lot's of other things

So, isn't it best to say that I'm interested in a discussion by asking
the question: What do you think?

/Tryggvi