[openspending-dev] Micro-services: OpenSpending's future architecture
Friedrich Lindenberg
friedrich.lindenberg at okfn.org
Mon Dec 22 17:16:52 UTC 2014
Hey Tryggvi,
thanks for writing up these thoughts, I think this is an incredibly
valuable discussion for us to have around OpenSpending. In many ways, I
agree with you: I also believe that OpenSpending would be a better piece of
software if it was more modular.
That change would help us to define great APIs, make the code base clearer
and perhaps it would also lead to more contributions - although I can't
help to consider a hypothesis, rather than a natural conclusion.
I think the main reason OS hasn't gathered massive numbers of contributors
is that the intersection of people who a) know about BI to some extent, b)
care about public finance from a civic point of view, c) care about open
source and d) aren't into starting their own thing is just very, very small.
What you are describing is a very appealing vision - a notion of small
pieces loosely joined. It represents the best of FOSS design.
Unfortunately, I'm not sure that FOSS design is what's going to help
OpenSpending have real-world impact. I believe your choice of primary
target audience (developers and data wranglers) is determined by OKF's
financial constraints and not by looking at the kinds of problems which
OpenSpending could help to solve.
I also think that this particular part of the FOSS ethic needs to be
reformed badly. And I think that OpenSpending could be a great case study
in doing so.
In the open source community, the idea that centralisation is bad has
turned into a sort of anachronistic dogma. While the commercial world has
discovered that centralised offerings can provide great value to users (and
advertisers), that realisation is semi-forbidden in open source land. If
everything must follow the UNIX philosophy, then the thing that's really
left for the open source community to innovate in is systems stuff, ie.
actual UNIX.
There are two exceptions to this: Wikimedia, mostly because the attempts to
decentralise Wikipedia have been so horribly bad (anyone remember
levitation?), and the large scale dissemination of pornographic movies (aka
BitTorrent). The latter is being eaten up by centralised services like
Netflix, Spotify, RedTube.
Whats underlying this is that the open source community still hasn't found
a way to provide web-based, end user-facing services. If anything will make
open source largely irrelevant to the web at large, it's this.
[[
A random example: Mozilla is trying to compete with Apple and Google on
building a smartphone system. It turns out, though, that a smartphone
system isn't really a piece of software that runs on a handset. It's a
large set of orchestrated services (location, profile, social, ...) that
your handset connects you to. When I attended their summit last year, they
had internal screaming matches about whether Mozilla should provide these
(and thus become a large-scale data hoarder, just like it's competitors).
Similarly, things like Diaspora just die because they represent bad service
design. Redecentralize [1] is a list of things that I am deeply sympathetic
with on an ideological level - but I don't think I (or most of my friends)
actually use a single one of these tools.
[1] https://github.com/redecentralize/alternative-internet
]]
So what should FOSS do? I believe that we need to start being serious about
providing open source, openly licensed, centralised services. These
services may be provided by open source platforms, but the platform in
itself is just not enough.
Technologists - especially us at OpenSpending - have this notion that we
can get away by just providing a platform. Others will then use it to
provide end-user services with our platform's data. This has actually
worked at least once, with OpenStreetMap.
But I just can't see very much evidence that it actually applies to
OpenSpending. The people who provide analytical services in this field -
let's name SpendNetwork and OpenGov.com - don't actually need to access our
large repository of data (or our APIs). Their customers are cities, and
these cities bring their own data (and APIs are easy to code).
This makes OpenSpending unlike OpenStreetMap, and it makes developers an
unrealistic and unwilling target audience for the project. I think the
budgetary constraints on OpenSpending have lead to a shift in thinking. The
discussion you're now having is not what problems need to be solved, but:
which ones are cheap to solve. Putting the code for a bunch of APIs on
GitHub and storing lots of CSVs on S3 is incredibly cheap, I'm just not
sure whose problem it solves.
OpenSpending could be a strong open source service, if it did two things:
a) actually start thinking even more about who it's end-users are and start
to provide them value, and b) convince a set of funders to financially
support the site until something fundamentally better is available.
OpenSpending, if it is addressed (directly and through it's satellites) at
citizens, journalists and policy analysts, is a public service. It needs to
find a funding mode that reflects this: grant funding, perhaps even public
funding.
OpenSpending, if it is addressed at a group of "other developers" who
magically need it's services and data yet don't face the same kind of
constraints OKF has and instead provide great public services, is a
fiction.
So, in summary: yes, let's make OS a modular application, because it's the
right thing to do. But let's not adopt the idea that a modular set of tools
is a replacement for a user-facing web service in 2014. Let's find a model
for OS to have an impact that doesn't involve the open source narrative
prop of "other developers" who don't have our problems.
I apologise for the length of my response.
Cheers,
Friedrich
On Mon, Dec 22, 2014 at 3:43 PM, Tryggvi Björgvinsson <
tryggvi.bjorgvinsson at okfn.org> wrote:
> Hi all,
>
> Warning, long and quite theoretical, but still important to discuss. For
> the short version please see the updated OpenSpending Enhancement
> Proposal #1:
>
> https://github.com/openspending/osep/blob/gh-pages/01-approach-and-architecture-of-openspending.md
>
> Conway's Law says: "A system reflects the organizational structure that
> built it".
>
> In its essence this is about how communications between team members
> affects the architecture of the software they're building. Conway's Law
> is very important for managers who are organizing the team. It tells
> them that they organize the team so that its communication lines reflect
> the software. If the system architecture changes, the team also needs to
> change.
>
> The problem with these recommendations is that they don't reflect open
> source development. In open source development projects, there is no
> manager responsible for putting the team together or re-organizing the
> team members. In open source software development the team is organic,
> self-selected. People join and help out (in the capacity they can) when
> and because they're interested.
>
> So from an open source software development point of view we have to
> turn this around. *The system should reflect the organizational
> structure that can build it*. It's my hypothesis or conjecture or
> whatever. What I'm trying to say is that it's the other way around in
> open source software development: the system needs to be designed to
> accommodate for the community that uses and will therefore build it.
>
> OpenSpending's community is complex. We all approach OpenSpending from a
> different perspective and different use cases. We all have specialised
> needs that have to do with how we interact with OpenSpending and what we
> expect to get from it. The current system however reflects an
> organizational structure of a close group of team members, who
> communicate internally (which is kind of how OpenSpending came into
> being). But that's not what OpenSpending should be. We're an open source
> software project. As such we create software that should be able to
> service many stakeholders, who communicate together publicly (scratch
> the itch and all that).
>
> So to cut this introduction short and save it from being too academic
> and boring. I want to propose a different system architecture for
> OpenSpending, one that invites a bigger community to participate and
> more aptly reflects the organizational structure we want.
>
> This new architecture is a micro-services solution. Lots of smaller
> components that can talk together via defined protocols. A system that
> can be extended to scratch itches and be simple enough to allow people
> to jump into a small project without having to dive through a mountain
> of (directly) unrelated code. So this takes David Parnas' information
> hiding to the system level (we're not inventing the wheel here).
>
> This triggers more software engineering theory goodness in me (which you
> may or may not interest you as much as me). We've kind of covered
> Parnas' Law: "Only what is hidden can be changed without risk". Parnas'
> Law is a two-edged sword. For us this would allow us to without much
> risk of causing a butterfly effect within the OpenSpending code base but
> it also makes it very important for us to have a good think about what
> we expose (so we don't change it very often because that's risky). So
> the interface is very important to get right but the flow behind it is
> something we can iterate fast on.
>
> Another law that touches this architectural change is Lanergan's Law:
> "The larger and more decentralized an organization, the more likely it
> is that it has reuse potential". It's still worth iterating even though
> that's what we developers usually do. The micro-services should be as
> general as possible so that they can get re-used. Who knows if they can
> get used outside the OpenSpending community and we'll have an even
> bigger group of people helping out with maintenance and development.
> Again this is kind of a reverse of the law. Let's design with reuse
> potential so we can have a large and more decentralized organization.
>
> And lastly in this probably-only-exciting-to-Tryggvi software
> engineering theory, a word of advice from DeRemer's Law: "What applies
> to small systems does not apply to large ones". Breaking things down in
> this way may not end up as being a more manageable system. It will be a
> bumpy road because experiences may not necessarily be shareable between
> micro-systems or for the overall system, but this architecture may
> instead invite a bigger development community, i.e. a bigger team that
> can share the burden. Hopefully some with experience in large system
> designs and others with experience in smaller systems, so different
> experiences and interests are going to be needed.
>
> Alright enough with theory! What are we going to do?
>
> To sum the architectural change up, we want to: *Centralize data and
> De-centralize presentation*.
>
> If you want to follow along you can take a look at the images in the
> OpenSpending Enhancement Proposal #01:
>
> https://github.com/openspending/osep/blob/gh-pages/01-approach-and-architecture-of-openspending.md
>
> This makes it quite difficult to talk about the OpenSpending platform
> because there will be no "central platform" per se, only a central
> repository of data, plus some subdomains to expose services. It's
> probably therefore better to think about the software itself in terms of
> the OpenSpending repos on github.
>
> The overall architecture is that we would split OpenSpending into three
> "layers" (in the images marked by mostly OpenSpending stuff, some
> OpenSpending and some "others stuff", and "Mostly "others" stuff:
>
> * Input and storage of data
> * Information retrieval and analysis
> * Presentation and external sites
>
> We propose that we put most of our focus going towards into receiving
> and storing raw data. That's the underlying building block. Without the
> data we have nothing. So rock-solid input of data, standardized formats
> to make it all usefull outside the context a single user wants to use it
> in. So the focus here would be in a Budget Data Package importer
> (standardized data). And storing the Budget Data Packages in something
> like a flat file storage (s3). Hook all of that into some permission
> system we devise and validation etc. This does not mean that we can just
> ignore everything that' not a Budget Data Package, so we'll need another
> importer which for example would map onto a Budget Data Package, at
> least to begin with, but imo we should focus on the BDP importer.
>
> Then we would also but a lot of power into the analysis of the data and
> making it accessible (but in such a way that it supports various and
> distributed presentation modes). However here we would expect others to
> also do things which wouldn't be in the OpenSpending github organisation
> repos. So the OLAP cube OpenSpending now imports, models and maps
> everything into, would happen in this area, but of course in a different
> way than what it currently does. We would now base everything off of
> standardized data and automatically import into the OLAP cube and
> perhaps build standardized aggregation queries and cache them properly.
> There could of course be others who want to use something else like
> Hadoop or something to analyse the data in a different way and they
> could. The raw data we serve (previous layer), is centralized but
> anybody can use it in the way they like.
>
> The services we would focus on would be an OLAP cube with standard
> aggregations, search (a different implementation probably than what we
> currently have) and SQL-like arbitrary queries to provide more
> professional access to the data where you could join datasets and things
> like that. We would front this with an API so the analysis bit isn't
> directly accessible, i.e we won't give anybody direct access to backend
> systems (but do it via an API micro-service), just like we wouldn't give
> anybody direct access to the data storage.
>
> The presentation layer is where we would put least of the focus, except
> only on supporting services/software solutions. We would leave this
> layer mostly up to "others" (which would still probably be part of the
> OpenSpending community). By that we mean that we wouldn't have many
> repos for presentation things (and move those we now have elsewhere)
> except for a few very general or specific ones. A general ones would be
> templates that people can use to build their own budget visualisation
> sites (like Where Does My Money Go?) or plugins like our WordPress
> plugins or the CKAN plugins (on the principle of trying to reduce the
> information hiding/exposure risk). In this layer we would also have the
> OpenSpending.org website but a simpler version of the current one.
> Basically just as a frontend for or link to some micro-services,
> introduction to the project etc.
>
> In between the reading of data (either raw data or analysis results via
> the reading api) we would provide some budget visualisations, mostly via
> OpenSpendingJS, but it would be open for others to implement their own.
> I put these into the information retrieval and analysis layer because
> they wouldn't be able to stand on their own and would be used by the
> presentation layer and require special knowledge of budget
> visualizations (e.g. adjusting for inflation when comparing across years
> etc.) So in a way it's a reading thing and something I expect us to
> provide most the core part, but yeah it could also be in the
> presentation layer. It doesn't really matter that much where we put it
> as long as we all understand what our role as a developer community is
> in providing these services.
>
> I think this email has already become too long so I'm going to stop for
> now and give you some room to think and contemplate but there are a lot
> of decisions we need to think about going forward, if this is something
> we agree on:
>
> * Integration layer between components (HTTP/Message queues/Carrier
> pidgeons)
> * How to build common components that can be re-used by all (or what we
> can re-use from others)
> * Preferred development language of components (preferred, not one to
> rule them all)
> * How and where to start? (what components should we start work on)
> * Migration of older datasets (existing one in OpenSpending, can we
> focus on BDP at all?)
> * Design of each component (probably separately by those who want to
> work on it)
> * Code and communication conventions for the dev community (common
> guidelines - we're a group!)
> * Lot's of other things
>
> So, isn't it best to say that I'm interested in a discussion by asking
> the question: What do you think?
>
> /Tryggvi
> _______________________________________________
> openspending-dev mailing list
> openspending-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/openspending-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/openspending-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/openspending-dev/attachments/20141222/12c98886/attachment-0002.html>
More information about the openspending-dev
mailing list