Whoops, I built a data replication engine (3/3).
In Part One and Part Two of this epic series, I covered the reasons why I decided to build my own sync service for Shopi, and the git-inspired approach I ended up taking. In this final part I’ll talk about how Basis handles some other tricky bits, and wrap up with some lessons learned.
Diverging States
The scenario: while on the train home from work, Bob opens the shopping list app on his phone to add a few items. His phone gets a terrible signal while on the train, and so it’s not connected to the sync server while he adds the items.
Meanwhile that morning, Bob’s partner Alice has already added several other items. When Bob’s phone finally gets a signal, and the shopping list app checks in with the sync server, the server sees a situation like this:
Bob’s added ‘Cheese’ and ‘Milk’, and Alice has added ‘Milk (2 litres)’ and ‘Apples’. Because we’re tracking which states they did this from, we can see that these are divergent from each other. And, because we’re not a quantum system, we cannot be in two states at once. So we do what git does in this sort of situation, we merge the two states into one:
Above, we see state 6f00
at the bottom has been created as a merge state for its two parents. On the left, the server has determined the delta that needs to be sent to Bob to bring him in line with the merged state, on the right is the delta for Alice.
We can see the merged list that both of them will end up with at the bottom. For Alice, the delta that gets her to this state is simply to add ‘Cheese’ to the end of her list:
{:items {:+ {3 {:name "Cheese"}}}}
For Bob, it’s a little more involved. Bob added ‘Milk’ at position 2 in his list, after Alice had already added it at position 1. Also, Alice entered an amount, while Bob didn’t specify anything. The server decides Alice gets priority by getting in first, so instructs Bob’s client to move ‘Milk’ from position 2 to 1, and to set ‘1 litre’ as the amount. This is the meaning of the :=
(update) operation applied to the items:
:= [2 {:amount "2 litres"} 1]
In the delta protocol, this means: ‘move item at position 2 to position 1, and set the :amount
property to ‘2 litres’.
To complete the process, Bob also needs to add ‘Apples’ to his list:
:+ {2 {:name "Apples"}}
You can imagine even more complex sorts of divergence that we need to handle. Bob deleted item 1 while Alice both moved and modified it. Both Bob and Alice entered conflicting amounts for an item. Etc. Basis handles these conflicts with a set of rules that generally produce the expected result, plus optional custom conflict handlers.
The Clever Bit
In my experience, in any large project there’s the 95% that’s mainly legwork and book-keeping. And then there’s the 5% that’s hard, that demands some quality hammock time.1 The clever bit.
In this case, the clever bit was working out how to compute these merge deltas efficiently. You might recall the golden rule from the beginning of the project: the amount of overhead required for a change should be proportional to the size of the change. Small changes should require small amounts of work.
Now, one way to generate the merge state deltas would be to generate a sensible merge of the two lists (which is moderately challenging in itself), then compute deltas by diff’ing between that and the two states on the clients. The catch is that the overhead of computing a diff is proportional to the size of the data you need to examine, so the bigger the data, the more overhead. With this approach, the amount of work scales with the size of the data, not the size of the update.
So the clever bit is an algorithm in Basis that computes a merge delta by only looking at the input deltas. The other clever bit was the algorithm to fold a series of sequential deltas into one, once again without resorting to diff’ing data (folding is used both in the merge algorithm and the garbage collector). Yet another clever bit was multi-master replication, which I won’t even try to go into here because that was a whole lot more hammock time.
Client Side
I haven’t talked about how I handled the client side, even though the need to put nearly all the same sync logic on the client as on the server was one of key issues I identified early on.
Well, it turned out that this was not quite the case. Of course the client needs to know something about the sync state model, but it doesn’t have to do as many of the clever bits. In particular, it doesn’t have to handle merges since it only ever talks to one other ‘peer’, the server itself.
In short, the client only needs to:
- Know the state ID that its data is based on (the server’s state).
- Keep a copy of the data that corresponds to the server state.
- Be able to generate a delta between the current (possibly modified) client state and the server state.2
- Be able to apply a delta from the server to the current state.
- Be able to calculate the state ID for its current state.
I won’t go into further detail on the client side, except to say that the Basis iOS client library is around 3,000 lines of Objective-C, of which only half is sync logic: the other half is the network layer, including TLS and authentication. So the client side ended up being gratifyingly manageable.
The Root Of All Evil?
Was the work spent on the clever bits of Basis an egregious case of premature optimisation, or wheel re-invention? I asked myself that question repeatedly as I lay in my hammock (I wish, it was actually my office floor) staring at the ceiling, more than a little chagrined at being there when I had simply meant to make a shopping list app.
But I couldn’t fault the golden rule: if I want this thing to have any chance of scaling in the future, I need this property. And, if I’m honest, once I’ve started down a rabbit hole like this I find it difficult to cut my losses and admit I’m too thick to work it out.
If I was to develop Shopi now, would I do the same thing? Nope, I would experiment with building on top of Couchbase Lite or Datomic. But now I’ve weathered the pain, and the server and app are live, I’m happy to have ownership of my product’s core. The Basis server is a known quantity to me, its sync latency is lower than Couchbase’s, non syncing-related services like client account management are integrated better, and the client API is better suited for my uses.
Basis is also written in Clojure, a language that has made me consistently happy, even while battling demons in my hammock.
I’ll end this 4,000-word magnum-opus-pretending-to-be-a-blog-post now. Thank you for humoring me. There’s an awful lot to Basis that I didn’t feel the need to use up internet for, but I’ll be happy to answer any questions.
The bit where you need to say: Stop! Hammock time.
Sorry. ↩︎
Alert readers will notice that I cheated on my golden rule: the client effectively ‘folds’ its deltas by keeping an (immutable) copy of the server state and generating diffs between that and the client’s modified state as needed. This approach is purely the product of laziness on my part: if it becomes a performance issue, I’ll address it by porting the delta folding algorithm from the Basis server. ↩︎