[openspending-dev] Porting OpenSpending to Flask

Wed Dec 24 13:21:14 UTC 2014

Hey all,

I just wanted to give a heads-up that the Flask port appears to be
functional now, it's passing tests locally and on Travis. In total, OS has
lost around 1,300 lines of active Python code (3,400 lines in total) - some
of which is due to my somewhat radical decision not to port the API version
1, though.

If anyone wants to try out the branch, it makes sense to use a fresh
virtualenv since there are many now unneeded dependencies and I've lifted
versions on the others.

https://github.com/openspending/openspending/pull/833 also has some
remaining tasks.

Let me know what you think!

- Friedrich

On Mon, Dec 22, 2014 at 2:37 PM, Friedrich Lindenberg <
friedrich.lindenberg at okfn.org> wrote:

> Hey Tryggvi,
>
> thanks for the comments, reply inline!
>
> On Mon, Dec 22, 2014 at 1:06 PM, Tryggvi Björgvinsson <
> tryggvi.bjorgvinsson at okfn.org> wrote:
>
>> I've been hatching the micro-services plan (which Rufus hinted at), in my
>> head for a long time and after our last developer meeting, Rufus Pollock,
>> Paul Walsh and I sat down to combine and align our thoughts and ideas so
>> that we could share them with the list for a better and more thorough
>> discussion (my initial thought was we could do it at the next developer
>> meeting).
>>
>
> I'm keen to learn more, about the lines of separation which you have in
> mind.
>
>
>> So, your email comes at a great time. We can just kickstart the
>> discussion now to align all of us and see how what Rufus, Paul and I have
>> been thinking about resonates with the rest of the list. To keep things
>> clean and because this is important, I'll describe what we're thinking and
>> our approach in a separate thread but I just wanted to say that it's
>> probably not time well spent to begin with the flask port before we've
>> agreed on the overall approach and even started moving some of the pieces
>> to micro-services because this is going to change the system quite a lot.
>>
>
> I'm about a hundred commits in, and would beg to differ: I think it's a
> great way to get rid of some technical debt before we venture into new
> stuff. So far, the code has only become cleaner, more readable, more
> handleable through this :)
>
>
>> On sun 21.des 2014 22:42, Friedrich Lindenberg wrote:
>>
>> I agree that it's be cool to split up OS into micro-services, and I'd say
>> that only gets easier once we have everything divided into blueprints.
>>
>>
>> Yes. I agree. The idea with the micro-services is to "unixify"
>> OpenSpending: Make independent things (where each one does only a few
>> things) work well together and to do them well and then allow people to tap
>> into OpenSpending at different end-points. If you want the raw data you can
>> access that, but you can also get analysis results for standard aggregation
>> queries you can, etc.
>>
>
> Sounds very, very good, but let's keep those people at the front of the
> discussion :) OpenSpending used to be much more modular, but it simply
> didn't generate much benefit with regards to allowing people to upload and
> visualise their budgets. So if we do it, we should make sure we're doing it
> in a way that actually makes it more attractive to users -- not just in
> order to conform to some diagram.
>
>> As for OSEP2, I want to focus on the bits that generate end-user value.
>> My interpretation is that raw data storage would be useful, but less so
>> than simple file upload. They may end up being the same thing, but with
>> different attitudes :)
>>
>>
>> I agree with you there as well. The raw data storage is still very
>> important. Currently OpenSpending does not save the raw data but downloads
>> it, imports it and then forgets about it. This means we cannot rebuild
>> things if we notice a bug in the import or half the earth gets attacked by
>> aliens and we lose all our backups (because if half the world gets
>> destroyed, there will still be OpenSpending fanatics who want to continue
>> to use our awesome software).
>>
>
> Agreed, very useful. I want to clean up my server, but if I touch a
> directory called "misc" the provenance for half of the German datasets dies
> :)
>
>
>> I think the interface should still be a simple file upload even if in the
>> backend we're storing a copy of the data, but I think we need to focus on
>> standardized input (more about that later) for the future because it's not
>> really helpful from a global user base point of view to have non-standard
>> data in OpenSpending.
>>
>
> I think I disagree with you on this. I feel that "standardised input" is
> something we should ask of governments that publish budget documents.
> Advising government on data release is not a business I'm in any more, I'd
> be keen to hear about OKF's efforts. For people who are cutting this stuff
> out of PDF files, "standardised input" means more (and less obvious) data
> wrangling. Again, this is something we asked people to do in the early days
> of OS, and it didn't work well at all.
>
> Instead, I think we need to remain fully flexible as to the shape of the
> input data, but allow for more comprehensive semantic modelling of the
> data. Obvious extensions would be hierarchies, categories for dimensions
> (functional, economic, institutional), and allowing slightly more borked
> data into the system.
>
>>  If I had to guess what the most pressing challenge is for the platform,
>> I would go with domain-specific metadata. OS has apparently got 2000
>> datasets now (massive jump, what happened?) - but it's near impossible to
>> find out which areas are covered, which datasets are current and how one
>> would update those that aren't.
>>
>> Yes, I really liked your idea of browsing the data from a metadata
>> perspective instead of titles. I actually think that would be the way to go
>> but I still have to digest and have a think about it some more. Would you
>> be willing to elaborate a little (e.g. with mockups) what you're thinking?
>>
>
> I'm not clear on it myself. I've been working on [1] as a sort of fancy
> front-end for a little survey of African budget transparency projects [2],
> and I could imagine that a phases, geographic and sectoral classification
> ("procurement data about health care in South Africa") should be the
> highest priorities.
>
> [1]: http://fierce-plains-8701.herokuapp.com/library/index.html
> [2]:
> https://docs.google.com/spreadsheets/d/1o7OM-UL9hbX3fRkGQUcDEFIAHTOmxnu-pQI_2tKFHos/edit?usp=drive_web
>
>>  More than that, the OS home page does a horrible job linking out to the
>> cool satellite sites like Spending.jp, WDMMG, budzeti.ba, CameroonBudget
>> or OffenerHaushalt. These shouldn't just be mentioned in random blog post,
>> but featured in the main system when people browse for budget data.
>>
>>
>> Oh yes. I agree with you there as well. It's a real shame we don't have
>> some sort of a "call home" feature and a registry of who's using
>> OpenSpending datasets (meaning a non-editorial approach).
>>
>
> But we do know about most of them :) So I think we don't need a technical
> solution, if we can have a library like the one above which also allows for
> non-OS-tech projects to be linked out :)
>
>>  Another random comment: reading the OS codebase today, I have to say
>> that I haven't learned to love the BDP. At the moment, it's turning into a
>> parallel, non-UI loading mechanism, when really the BDP should link into
>> the process much more smoothly. One mechanism -- model/mapping or BDP
>> should be the "truth", and the other one should map onto it. I'm really not
>> sure what the best approach is.
>>
>>
>> So more on the standard input thing. What you're talking about is exactly
>> what I want. The BDP importer in the code base basically just converts the
>> budget data package into an OpenSpending model (dimensions and attributes)
>> at the moment. It's far from how I would like OpenSpending to treat the
>> BDP, but it's still a more user-friendly way of loading it than what
>> https://github.com/openspending/biab does (so it's only focussing on API
>> at the moment).
>>
>> What we are proposing in our new approach (the micro-services thing) is
>> that BDP will be "the truth" and others will have to map onto it (perhaps
>> in their own external services). We might in the transitioning period have
>> an unstructured importer as an "official" micro-service but hopefully we
>> can design it in such a way that the end-result will still be a budget data
>> package.
>>
>> I know this means we might actually lose information that's stored in
>> OpenSpending (since some budget data has more information than what the BDP
>> stores) but I don't think we're really after ALL the data, just the most
>> important one, and if we do need some more data, we can just propose that
>> as an addition to the BDP spec or have a separate micro-service that stores
>> and links to the budget data packages to provide more context ("official"
>> or maintained by other).
>>
>> In any case, I believe settling on BDP as "the truth" is important if the
>> datasets in OpenSpending are supposed to be useful for more people than
>> only those who are building a visualisation on top of their own dataset.
>>
>> I welcome all comment but I'm now going to move over to starting the
>> other thread (which is an email that will probably take me a while to write
>> because I want to get it right with such a big suggestion/proposal).
>>
>
> Ok, waiting for that thread. My initial reaction is that we're agreed on
> the goal: richer metadata inside of OS, which describes the data in a way
> that is semantic towards fiscal information. On the other hand, BDP is a
> data standard, not a domain model - so I don't think that BDP should be
> designed such that it meets all the metadata needs we might have inside of
> OS. This seems like a superficial match to me, which has a high risk of
> making the whole platform less accessible.
>
> Cheers,
>
> - Friedrich
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/openspending-dev/attachments/20141224/6c0e773b/attachment-0002.html>