[wdmmg-discuss] Failed to port datastore to RDF, will go Mongo

William Waites ww at eris.okfn.org
Wed Dec 1 12:45:21 UTC 2010

* [2010-12-01 15:17:07 +0300] Ivan Begtin <ibegtin at gmail.com> écrit:

] About Mongo advantages I would like to add it's flexibility and
] performance.
] For schemaless data - it's probably best choice.

I like Mongo. The main thing I have to say is a matter of programming
practice, namely please use URIs rather than slugs so that it is
possible to refer to data from outside the system and try to arrange
for these URIs to dereference.

] And disadvantages:
]  - not space effective - same data in Mongo and SQL, in case of Mongo
] allocates more disk space. Thats why we keep raw data outside MongoDB.

Also, cost. The main reason we abandoned the experiments with Mongo
last spring was because you need to run it on 64 bit hosts if you have
a non-trivial amount of data (OKF runs things at AWS so that 4x more
expensive), and because there is no checkpointing or crash recovery
you need to make sure to run the databases in pairs. So that's 8x more
expensive than a comparable SQL setup at AWS.

] Also I think that it's not yet right to provide public access to the Mongo
] rest api. I think that better way is to have wrapper using Rails/Django/any
] other framework that will provide query results caching.

More generally, there's no standard query language which is another

Experience so far with http://bibliographica.org/ running an RDF store
using Virtuoso has been quite good. There is a lot of data (3 million
records or "documents" about 173 million triples) and performance is
quite good. It does force you to think more carefully about data
modelling though and if you want a low-level interface to the database
(over ODBC as with the way this application is built) you need to roll
up your sleeves a bit and build some parts from source and apply a
patch or two...

The Mongo approach, like others, is good for building silos. Even if
they are open silos. If I wanted to correlate russian data with UK
data, for example, I would have to do a very large amount of work
figuring out what the schemaless information means and how it

As an aside, for Ruby fans, I've heard quite good things about
http://www.activerdf.org/ would be interesting to hear

William Waites
9C7E F636 52F6 1004 E40A  E565 98E3 BBF3 8320 7664

More information about the openspending mailing list