[okfn-help] ORDF feedback and questions

William Waites william.waites at okfn.org
Thu Jun 10 18:29:03 BST 2010


On 10-06-10 16:41, Alistair Turnbull wrote:
> Sorry, I have just found this draft in my "postponed messages" folder.
> It is now out of sequence, but hopefully still relevant.

No worries.

> Hmmm. How about "quote" for reify and "unquote" for disembody?

Well "reify" is the officially sanctioned nomenclature...

> I don't think the "added" and "removed" triples ever get persisted, do
> they?

They are persisted by the fourstore which only saves the extracted
metadata from the changeset and not the list of changes itself. This
could change though.

> Oh, right. The whole of the implementation apart from `undo()` is
> consistent with the interpretation that removals happen before
> additions. However, if no triple can be both added and removed then
> the question is moot.
>
> I suggest you document this assumption. If not (i.e. if you do not
> want to make this assumption) then you need to fix `undo()` so that it
> undoes additions first and then removals. That way, although it might
> remain undefined behaviour, at least it won't corrupt any data. Best
> practice would be to do both!

For the moment -- about to get sucked into duplicate detection, I'm happy
to leave this to your and Graham's judgement. I agree it should be
documented though.

> Wow, that's expensive. So if I completely replace a graph with a new
> graph I use enough storage to hold ten copies of the old graph plus
> six copies of the new graph.
>
> There's no good way around it. I can see that you have to store at
> least one copy of the old graph, and that it must be quoted.
>
> Do any popular triple stores have optimisations for storing quoted
> graphs? It looks doable.

I think quoted graphs are another animal altogether -- and not
supported even with all serialisations. If you start getting into
formulas (inference rules) you get things like,

{ ?a parent ?b } => { ?b child ?a }

both the LHS and RHS are "quoted graphs". In theory the
quoted graphs may themselves be reified... Fun.

As well, as Graham and David Jones and I were discussing in
IRC the other day, there might be other information you would
want to add to the reified version of a statement, for example,
(valid_from, valid_to) times, perhaps a confidence metric, etc.
This is well beyond current practice and not supported in any
special way by any triple stores I am aware of.

I'm particularly interested in confidence metrics at the moment
because they seem relevant for deduping - e.g.

 a owl:sameAs b with probability 0.8

this is definitely new territory though.

On the bright side, w.r.t. expense, most operations will never
happen on the reified versions of statements and disk space is
cheap :P

>> Can be a DAG especially since a changeset that contains changes
>> for two or more graphs may have several ancestors.
>
> Good example: you should use it in the docs. Also, I note that you
> support everything you need to write a "merge" tool for
> non-conflicting changes to the same graph.

Cool! Off to the house to try to talk talktalk into turning on our
DSL early....

Cheers,
-w

-- 
William Waites           <william.waites at okfn.org>
Mob: +44 789 798 9965    Open Knowledge Foundation
Fax: +44 131 464 4948                Edinburgh, UK



More information about the okfn-help mailing list