[ckan-dev] Views on data through the web

Thu Jun 16 11:42:30 UTC 2011

On Thu, Jun 16, 2011 at 11:16:42AM +0100, Rufus Pollock wrote:
> On 16 June 2011 00:25, Francis Irving <francis at scraperwiki.com> wrote:
> > Yep, I hope we can share the SQL data server.
> >
> > I think the core of it is the same for each of us - and one day we'll
> > both want multiple server scaling of some sort, and no doubt lots of
> > other things.
> >
> > It would make sense for it to be a separate open source project.
> 
> Agreed. I took a look at the code recently (I think it was
> scraperlibs/scraperwiki/sqlite.py and datastore.py in same directory).
> Not sure I had a good understanding of what was going on though. How
> would we go about collaborating on turning this into a standalone lib
> (if we all wanted to do this)?

That's the client library.

The server is in scraperwiki/uml/dataproxy (annoyingly the same name
as what you call something else!)

There is an init script for launching it at uml/etc/init.d/dataproxy

If you have questions about it, please email the ScraperWiki Google
Group, or developers at scraperwiki.com, where Tom/Ross/Julian hang out.

> > Quite surprising that we've both settled on SQLite for this...
> > (Everybody else busy talking about CouchDB and MongoDB...)
> 
> It's lightweight and portable and you can get it everywhere - plus we
> want lots of little dbs not one mega-db. (Plus I've used couchdb and
> mongodb and I'm still rather dubious of benefits over tried and tested
> sql ...)

Yep, it's a product/market fit thing - we're generally not working on
large datasets like the Twitter firehose, so a distributed store with
fancy Hadoop map/reduce just isn't useful.

I'm quite glad we're using SQL - things like GROUP BY are simpler, and
known by more people, than the equivalent syntax in MongoDB.
Especially for people newish to programming.

Francis