[openspending-dev] Porting OpenSpending to Flask

Mon Dec 22 13:37:58 UTC 2014

Hey Tryggvi,

thanks for the comments, reply inline!

On Mon, Dec 22, 2014 at 1:06 PM, Tryggvi Björgvinsson <
tryggvi.bjorgvinsson at okfn.org> wrote:

> I've been hatching the micro-services plan (which Rufus hinted at), in my
> head for a long time and after our last developer meeting, Rufus Pollock,
> Paul Walsh and I sat down to combine and align our thoughts and ideas so
> that we could share them with the list for a better and more thorough
> discussion (my initial thought was we could do it at the next developer
> meeting).
>

I'm keen to learn more, about the lines of separation which you have in
mind.

> So, your email comes at a great time. We can just kickstart the discussion
> now to align all of us and see how what Rufus, Paul and I have been
> thinking about resonates with the rest of the list. To keep things clean
> and because this is important, I'll describe what we're thinking and our
> approach in a separate thread but I just wanted to say that it's probably
> not time well spent to begin with the flask port before we've agreed on the
> overall approach and even started moving some of the pieces to
> micro-services because this is going to change the system quite a lot.
>

I'm about a hundred commits in, and would beg to differ: I think it's a
great way to get rid of some technical debt before we venture into new
stuff. So far, the code has only become cleaner, more readable, more
handleable through this :)

> On sun 21.des 2014 22:42, Friedrich Lindenberg wrote:
>
> I agree that it's be cool to split up OS into micro-services, and I'd say
> that only gets easier once we have everything divided into blueprints.
>
>
> Yes. I agree. The idea with the micro-services is to "unixify"
> OpenSpending: Make independent things (where each one does only a few
> things) work well together and to do them well and then allow people to tap
> into OpenSpending at different end-points. If you want the raw data you can
> access that, but you can also get analysis results for standard aggregation
> queries you can, etc.
>

Sounds very, very good, but let's keep those people at the front of the
discussion :) OpenSpending used to be much more modular, but it simply
didn't generate much benefit with regards to allowing people to upload and
visualise their budgets. So if we do it, we should make sure we're doing it
in a way that actually makes it more attractive to users -- not just in
order to conform to some diagram.

> As for OSEP2, I want to focus on the bits that generate end-user value. My
> interpretation is that raw data storage would be useful, but less so than
> simple file upload. They may end up being the same thing, but with
> different attitudes :)
>
>
> I agree with you there as well. The raw data storage is still very
> important. Currently OpenSpending does not save the raw data but downloads
> it, imports it and then forgets about it. This means we cannot rebuild
> things if we notice a bug in the import or half the earth gets attacked by
> aliens and we lose all our backups (because if half the world gets
> destroyed, there will still be OpenSpending fanatics who want to continue
> to use our awesome software).
>

Agreed, very useful. I want to clean up my server, but if I touch a
directory called "misc" the provenance for half of the German datasets dies
:)

> I think the interface should still be a simple file upload even if in the
> backend we're storing a copy of the data, but I think we need to focus on
> standardized input (more about that later) for the future because it's not
> really helpful from a global user base point of view to have non-standard
> data in OpenSpending.
>

I think I disagree with you on this. I feel that "standardised input" is
something we should ask of governments that publish budget documents.
Advising government on data release is not a business I'm in any more, I'd
be keen to hear about OKF's efforts. For people who are cutting this stuff
out of PDF files, "standardised input" means more (and less obvious) data
wrangling. Again, this is something we asked people to do in the early days
of OS, and it didn't work well at all.

Instead, I think we need to remain fully flexible as to the shape of the
input data, but allow for more comprehensive semantic modelling of the
data. Obvious extensions would be hierarchies, categories for dimensions
(functional, economic, institutional), and allowing slightly more borked
data into the system.

>  If I had to guess what the most pressing challenge is for the platform,
> I would go with domain-specific metadata. OS has apparently got 2000
> datasets now (massive jump, what happened?) - but it's near impossible to
> find out which areas are covered, which datasets are current and how one
> would update those that aren't.
>
> Yes, I really liked your idea of browsing the data from a metadata
> perspective instead of titles. I actually think that would be the way to go
> but I still have to digest and have a think about it some more. Would you
> be willing to elaborate a little (e.g. with mockups) what you're thinking?
>

I'm not clear on it myself. I've been working on [1] as a sort of fancy
front-end for a little survey of African budget transparency projects [2],
and I could imagine that a phases, geographic and sectoral classification
("procurement data about health care in South Africa") should be the
highest priorities.

[1]: http://fierce-plains-8701.herokuapp.com/library/index.html
[2]:
https://docs.google.com/spreadsheets/d/1o7OM-UL9hbX3fRkGQUcDEFIAHTOmxnu-pQI_2tKFHos/edit?usp=drive_web

>  More than that, the OS home page does a horrible job linking out to the
> cool satellite sites like Spending.jp, WDMMG, budzeti.ba, CameroonBudget
> or OffenerHaushalt. These shouldn't just be mentioned in random blog post,
> but featured in the main system when people browse for budget data.
>
>
> Oh yes. I agree with you there as well. It's a real shame we don't have
> some sort of a "call home" feature and a registry of who's using
> OpenSpending datasets (meaning a non-editorial approach).
>

But we do know about most of them :) So I think we don't need a technical
solution, if we can have a library like the one above which also allows for
non-OS-tech projects to be linked out :)

>  Another random comment: reading the OS codebase today, I have to say
> that I haven't learned to love the BDP. At the moment, it's turning into a
> parallel, non-UI loading mechanism, when really the BDP should link into
> the process much more smoothly. One mechanism -- model/mapping or BDP
> should be the "truth", and the other one should map onto it. I'm really not
> sure what the best approach is.
>
>
> So more on the standard input thing. What you're talking about is exactly
> what I want. The BDP importer in the code base basically just converts the
> budget data package into an OpenSpending model (dimensions and attributes)
> at the moment. It's far from how I would like OpenSpending to treat the
> BDP, but it's still a more user-friendly way of loading it than what
> https://github.com/openspending/biab does (so it's only focussing on API
> at the moment).
>
> What we are proposing in our new approach (the micro-services thing) is
> that BDP will be "the truth" and others will have to map onto it (perhaps
> in their own external services). We might in the transitioning period have
> an unstructured importer as an "official" micro-service but hopefully we
> can design it in such a way that the end-result will still be a budget data
> package.
>
> I know this means we might actually lose information that's stored in
> OpenSpending (since some budget data has more information than what the BDP
> stores) but I don't think we're really after ALL the data, just the most
> important one, and if we do need some more data, we can just propose that
> as an addition to the BDP spec or have a separate micro-service that stores
> and links to the budget data packages to provide more context ("official"
> or maintained by other).
>
> In any case, I believe settling on BDP as "the truth" is important if the
> datasets in OpenSpending are supposed to be useful for more people than
> only those who are building a visualisation on top of their own dataset.
>
> I welcome all comment but I'm now going to move over to starting the other
> thread (which is an email that will probably take me a while to write
> because I want to get it right with such a big suggestion/proposal).
>

Ok, waiting for that thread. My initial reaction is that we're agreed on
the goal: richer metadata inside of OS, which describes the data in a way
that is semantic towards fiscal information. On the other hand, BDP is a
data standard, not a domain model - so I don't think that BDP should be
designed such that it meets all the metadata needs we might have inside of
OS. This seems like a superficial match to me, which has a high risk of
making the whole platform less accessible.

Cheers,

- Friedrich
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/openspending-dev/attachments/20141222/9fdfb48e/attachment-0002.html>