[data-protocols] SLEEP / The Cut-Out

Thu Jan 31 18:08:10 GMT 2013

 First off: Hi Ian, and welcome!

On 28 January 2013 17:10, Ian Bicking <ian at ianbicking.org> wrote:
> Hi all.  I came upon the SLEEP concept:
> http://www.dataprotocols.org/en/latest/sleep.html
>
> I have a project which is very similar and might be useful for progressing
> the idea: http://thecutout.org/ and the protocol specifically:
> http://thecutout.org/protocol.html
>
> The protocol even looks strikingly similar, probably because it's a natural
> way to do time-ordered updates.  The basic protocol with The Cut-Out is:
>
> GET /bucket?since=INDEX&collection_id=ABCDEF
>
> This returns a sequence of all updates since INDEX (a counter, not a
> timestamp), and collection_id asserts the identity of the bucket (if for any

+1 on this. Conversion from a timestamp to the counter is usually
straightforward.

In CKAN we adopted this route (with revision_id replacing since=INDEX)

> reason the bucket is overwritten, then INDEX will become meaningless and the
> client must do a complete sync).
>
> The result is:
>
> {objects: [
>   [INDEX1,
>    {type: "object_type",
>     id: "unique id for objects of this type",
>     data: {unstructured JSON data}
>    }
>   ], ...]}
>
> And to save updates:
>
> POST /bucket?since=INDEX&collection_id=ABCDEF
> [{type: "object_type", id: "some id", data: {...}}, ...]

I guess my question here (and one I had about SLEEP etc) is that a
sync protocol *on its own* without information on e.g. generation of
change objects, and the merge and the diff format is only so useful
(and one is avoiding the hard part ;-) )

For example, re SLEEP, a natural use case was using this with say 2
sqlite databases. In that case the hard work would be generating the
changes from the sqlite db

Thus, I'm wondering if you have an example of using this protocol live
to do syncing and, if so, how the sync side integrated with merging
(and possibly diffing). (cf
http://www.dataprotocols.org/en/latest/revisioning-data.html)

> It's not a peer-to-peer system, the server is considered the canonical
> source of truth.  Clients must get all updates and resolve conflicts before

Can you sync from multiple sources or is it one-way? Also what happens
if I've written locally and then sync and get conflicts (cf above qu)

> POSTing new items (this could cause some problems if there's a lot of
> activity – but we are not assuming that objects are independent, and so
> there might be a conflict even among different objects).  There are also
> other details to the protocol, handling these conflicts – I haven't used
> REST principles and instead try to make every request advance the sync
> process.  So for example if you POST updates and get a conflict then it'll
> return the objects you haven't seen, because you'll need those to complete a
> second POST.  I think there's a wide variety of use cases where this kind of
> non-REST approach will be more efficient.  Individual objects are also not
> URL-addressable, though there is support for lazy fetching of large objects:
> http://thecutout.org/protocol.html#blobs

Makes sense

> Then I suppose I should note "why" I wrote The Cut-Out.  Because I could, of
> course.  But more specifically I was working on a project that involved
> syncing data across clients, and while the project got thrown away I had my
> mind all up in the concept, and I started seeing more and more reasons for
> time-ordered data.  Also I saw a specific use case of stand-alone HTML
> applications with no server, which can be entirely functional but lack
> backups and data synchronization across devices.  That's what The Cut-Out
> actually provides – both a server and client that handle browser-based data
> persistence, along with authentication using Persona
> (https://developer.mozilla.org/en-US/docs/persona).  The server is also
> written with this particular use case in mind, emphasizing low overhead for
> individual buckets, and a write-heavy workload (there's many cases where
> there will be writing and never reading).  The server is arguably
> over-optimized ;)  I

It would be great to have a pointer to a live example with a "real"
app (which I imagine, as per above, incorporates merging

> There are some private parts of the server protocol that allow servers to
> balance and move buckets around, similar I think to aspects of the CouchDB
> protocol, but they are very specific to the implementation.  The general
> concept is not peer-to-peer, meaning that a central canonical server is
> essential to how the syncing works.

Understood.

Rufus