[openbiblio-dev] New instance: eu11.okfn.org

William Waites ww at eris.okfn.org
Mon Nov 15 12:09:17 UTC 2010

* [2010-11-15 11:43:39 +0000] Rufus Pollock <rufus.pollock at okfn.org> écrit:
] That's still very slow (i.e. 59h to do the whole lot!). Can one turn
] off transactions or the like for bulk uploading to speed it up?

I could turn the indexing off (particularly the FTS index)
but then it would just have to be built after the load
anyways. Might be marginally faster.

Also note that the data that we have is about 3 million
records. We don't have the entire 30 million. So, modulo
the odd record that stops the import (about one every
million records, so not a big deal to handle manually)
it should take about 5 hours for the whole import. I 
think it's safe enough to say that all the data will
be loaded by tomorrow.

] Also are you doing any de-duping (at least on entities)? (Since we're
] creating them may be sensible to dedupe as part of upload ...)

Entities are given http://bibliographica.org/entity/hash(name)
as URIs, this is what Ben did for them, and presumes that
the authority records are unambiguous. A more detailed
analysis of this (and other properties of the dataset)
is the next step once the data is loaded and queriable.

In general we do *not* want to do any invasive operations
like trying to dedup as part of the upload. Far better
to keep things separate and simple. The only modifications
to the data so far are some basic vocabulary cleanup and
the addition of some owl:sameAs links for the entities
and for ISBNs.

William Waites
9C7E F636 52F6 1004 E40A  E565 98E3 BBF3 8320 7664

More information about the openbiblio-dev mailing list