[wdmmg-discuss] Failed to port datastore to RDF, will go Mongo

Rufus Pollock rufus.pollock at okfn.org
Wed Dec 1 11:05:03 UTC 2010


On 29 November 2010 17:19, Francis Irving <francis at flourish.org> wrote:
> Thanks for that writeup Friedrich, very interesting.
>
> Two things:
>
> 1) I'd love to see your long technical pro/con email from before. I
> can't find it in the mailing list archive, did you send it there?

Friedrich prepared a summary of the different options (SQL, MONGO,
RDF) on the main wdmmg pad <http://okfnpad.org/wdmmg> and I've inlined
the summary below (empahsize still in progress)

> 2) I'm wondering if a leightweight linked CVS or linked JSON can help.
> By this I mean using URLs as attribute values, and even keys, for
> fields which refer to other types (for which there is an ontology, or
> you feel like making one). That would provide the forward compatible
> hook, that William refers to.

This is definitely a possibility and this was something that will
waites explored back in March when using Mongo as an RDF store:

<http://wwaites.posterous.com/mongo-as-an-rdf-store>

Rufus

## SQL vs Mongo vs RDF

Requirments:
  * Arbitrary sparse metadata
  * Ability to reify values (and keys)
  * Dataset, Entry, Entity, Classifier
  * Web-app coders can participate

SQL
  + very standard infrastructure (many good open-source RDMS)
  + very familiar to coders
  + very good libraries (e.g. sqlalchemy)
  - - have to hack in a key/value structure
    * no typing on values (without lots of effort)
  - lots of joins ...
  - relatively poor match to serialization format (json)

Summary: ultimately a poor match for the data so despite maturity we
plan to move on.

MONGO
  + best middleground document store
  + good set of libraries and debian packages etc
  + quite good match to structure though relationships are sort of ugly
  + relatively fast (?
  + built in rest api
  + some neat features such as in-built geo support and sharding
(scalability), map-reduce
  - poor relationships
  - still quite immature (libraries are potentially changing quite rapidly)
  - not very descriptive on "predicates"

RDF
  + very nice match to data structure
  + data is being published in some places in this form
  + a converging standard
  - - no mature open source data store available
    * Getting better: e.g. virtuoso, jena, rdfstore but can one get
installable packages that work on mac, on windows, on ubuntu? Are
there debs?
  - - poor library support (options: rdflib, SuRF in python ...)
    * client library support is improving but still seems limited
compared with other areas
  - - significant demands on developers to understand schemas (what is
available, how they work together etc)
  - limited analytics support (count, sum, etc)




More information about the openspending mailing list