[wdmmg-discuss] Failed to port datastore to RDF, will go Mongo
Rufus Pollock
rufus.pollock at okfn.org
Wed Dec 1 11:05:03 UTC 2010
On 29 November 2010 17:19, Francis Irving <francis at flourish.org> wrote:
> Thanks for that writeup Friedrich, very interesting.
>
> Two things:
>
> 1) I'd love to see your long technical pro/con email from before. I
> can't find it in the mailing list archive, did you send it there?
Friedrich prepared a summary of the different options (SQL, MONGO,
RDF) on the main wdmmg pad <http://okfnpad.org/wdmmg> and I've inlined
the summary below (empahsize still in progress)
> 2) I'm wondering if a leightweight linked CVS or linked JSON can help.
> By this I mean using URLs as attribute values, and even keys, for
> fields which refer to other types (for which there is an ontology, or
> you feel like making one). That would provide the forward compatible
> hook, that William refers to.
This is definitely a possibility and this was something that will
waites explored back in March when using Mongo as an RDF store:
<http://wwaites.posterous.com/mongo-as-an-rdf-store>
Rufus
## SQL vs Mongo vs RDF
Requirments:
* Arbitrary sparse metadata
* Ability to reify values (and keys)
* Dataset, Entry, Entity, Classifier
* Web-app coders can participate
SQL
+ very standard infrastructure (many good open-source RDMS)
+ very familiar to coders
+ very good libraries (e.g. sqlalchemy)
- - have to hack in a key/value structure
* no typing on values (without lots of effort)
- lots of joins ...
- relatively poor match to serialization format (json)
Summary: ultimately a poor match for the data so despite maturity we
plan to move on.
MONGO
+ best middleground document store
+ good set of libraries and debian packages etc
+ quite good match to structure though relationships are sort of ugly
+ relatively fast (?
+ built in rest api
+ some neat features such as in-built geo support and sharding
(scalability), map-reduce
- poor relationships
- still quite immature (libraries are potentially changing quite rapidly)
- not very descriptive on "predicates"
RDF
+ very nice match to data structure
+ data is being published in some places in this form
+ a converging standard
- - no mature open source data store available
* Getting better: e.g. virtuoso, jena, rdfstore but can one get
installable packages that work on mac, on windows, on ubuntu? Are
there debs?
- - poor library support (options: rdflib, SuRF in python ...)
* client library support is improving but still seems limited
compared with other areas
- - significant demands on developers to understand schemas (what is
available, how they work together etc)
- limited analytics support (count, sum, etc)
More information about the openspending
mailing list