[ckan-dev] Started work on the 'webstore' (a datastore with web API)

Mon Jul 18 11:59:51 UTC 2011

Hi,

it seems that David has completed a working prototype [1] of the
postgres-based version of this but I'm still torn over the pros and
cons.

On Sun, Jul 17, 2011 at 1:01 PM, Nick Stenning <nick at whiteink.com> wrote:
> I agree that hacking together some combination of sqla and pysqlite is
> probably not a good solution. However, I'm really strongly in favour of
> keeping sqlite as the backend if at all possible, partially for personal
> reasons (I do almost all my data hacking with sqlite databases and would
> love to be able to upload them direct to webstore) but mainly for reasons of
> isolation.

I also like sqlite as a backend since it reduces the whole system to
"just a bunch of files", which I find very convenient. On the other
hand, the schema isolation, threading issues and even performance
might speak for psql.

> (i.e. users could do SELECT * FROM cra.cra2010 cra JOIN
> "openspending.cofog".cofog cofog1 ON cra.cofog1_id = cofog1.id)
>
> Are you suggesting here that people would be able to SELECT across different
> webstore databases? If so, I'm strongly in opposition to this idea. I
> recognise the elegance in principle, but in practice I think evidence is the
> supposed reuse of data doesn't work ... And I think OpenSpending can testify
> to that. We should prevent users from building queries that rely on *other
> people's schemas*.

Probably true, this is prone to fail. The current implementation does
not allow this, btw.

> So, having told you what I'm not in favour of, here's another suggestion:
> There's one very easy way of ensuring read-only queries against an SQLite
> databases, and that's to connect to them as a unix user with read-only
> permissions. We can then (presumably) catch any errors that arise as a
> result of trying to run read-write queries against them and present a nicer
> error message to the user.

The problem with this is again architectural: flask runs things in a
multi-threaded WSGI pipeline, so you don't have the classical CGI
option of just becoming "nobody" for the time of a request. But maybe
I'm missing something here?

> I'm thinking that maybe webstore could spawn a process as a different user,
> dependent on the authz status of the user making the API request? One user
> has RW access to the db, one has RO.

Do you mean a secondary background daemon or just shelling out to
something? That would be overdoing it IMO, then I'd prefer PSQL.

> There's also some discussion about ensuring that only RO transactions are
> happening at a higher level by monkeypatching sqla
> here: http://www.mail-archive.com/sqlalchemy@googlegroups.com/msg21801.html
> What are your thoughts?
> N

- Friedrich

[1] https://github.com/okfn/webstore/commit/b50b8ab125fb846c01e8c8a13186665bbd539dc3

-- 
Open Knowledge Foundation
Promoting Open Knowledge in a Digital Age
http://www.okfn.org/ - http://blog.okfn.org/

http://twitter.com/pudo
http://pudo.org