Note! This post assumes a bit of familiarity with Merkle Search Trees (MST)s, an internal data structure used in atproto repositories. It's a niche subject that most people don't ever need to care about! (+extremely cool if you're curious and haven't had a chance to read about!)
There's a fun line about mitigating attacks on atproto's Merkle Search Trees in the draft IETF specification, Authenticated Transfer Repository and Synchronization. Implementations should
Limit the number of entries per MST node to a statistically reasonable maximum
— draft spec section 5.2, mst structure attacks
atproto.com echoes the vagueness:
[...]implementations should limit the number of TreeEntries per Node to a statistically unlikely maximum length.
— Repository spec, Security Considerations
What actual number should we pick for this maximum??
What's a node?
An MST is a tree (right in the name). A tree made of nodes:
Nodes connect together to form the tree. An Entry ("TreeEntry") is one key/value pair inside a node. Keys are lexicographically sorted from left to right within entries, and in fact are globally sorted across the entire tree.
To raise the entries from just a sorted list into an actual tree, every key is is assigned a layer, which you can calculate:
hash the key (atproto uses sha256)
count leading zeros
divide by a constant (atproto: / 2)
And that gives you the layer of your key
The divide-by-2 determines the MST's "fan-out", which is 4 for atproto. Only 25% of sha256 hashes start with 00… so 75% of all keys are in layer 0, 4x more keys (fan-out!) than layer 1.
Hey we're determining a tree structure based on a hash function! A statistical data structure! And
An attacker controls her own keys; you have to deal with her MST. She can pick keys to manipulate the tree shape.
The spec doesn't elaborate an attack, but I suppose a super-wide node (tonnes of entries) could be inefficient:
Wastes bandwidth on the firehose
If you have limited space to buffer, maybe it's too big and your code breaks?
We're supposed to pick a "reasonable" maximum, and we'll have to reject entire MSTs that exceed it (conclusion foreshadowing?) since generally the whole tree has to be processable to be validatable (thanks, Merkle!).
You can just do things make wide nodes
Layer 0 is the bottom. It can't have subtrees, so a simplified layer 0 node can look like this:
To make this node wider, you just insert entries who's key:
belongs at layer 0
sorts between keys
AandC
This is easy!
Any key has a 75% chance of being in layer 0, so when "mining" them you only have to discard 25% of your attempts!
There is a lot of space to insert between most pairs of keys! Between
AandByou have keys likeAAAAAAAA,AAAAAAAB,AAAAAAAC, …
Statistically reasonable maximum
So it's very easy to create very wide nodes at layer 0.
There should be 4 entries per node on average (fan-out), but there can be more. There should some statistical distribution we can find in non-malicious MSTs.
What should we set as a reasonable maximum?
PAUSE HERE AND WRITE DOWN YOUR GUESS.
THE AVERAGE IS 4. HOW HIGH CAN IT GET?
Someone who likes math more than me could think about this and figure out what's statistically reasonable. I prefer to avoid thinking when possible, and I've got 490K real atproto MSTs to play with instead. Let's just
walk every MST
count the entries in every node
put the counts on a graph
hope it resembles a not-too-weird distribution
???
pick a reasonable maximum
also maybe
skip the whole MST if its highest layer is less than 4.
…to "minimize low-count bias from tiny trees without many entries" or some shit. (lots of empty and mostly-empty repos)
Histograms <3
Let's try this out on one (very large) real repository MST:
Ok! Hi! Hey! it's our first hint at a distribution!
…why are there some zero-entry nodes?
in atproto MSTs the node links are not allowed to skip layers, so empty nodes are inserted where needed to step down. we can safely ignore them. you can safely forget about them:
Take 2
Okay okay! look at that exponential fit! beauty! decay! Is -0.288 our magic number? Did we just hit a fifty-entry-wide node on our first go??
We expected 4 entries per node on average. With 701,765 total nodes our average was 3.9991! This feels pretty good!
but
we need more data. (take 3)
Fun fact, this hacky lil script chews through CARs at >1.7GiB/sec with repo-stream ✨
whoa. wait. tens of thousands of entries?? no. oops. augh. i mixed up the counts and the counts of counts when merging maps.
Take 4 (231GB of repos in just 2.5 mins, with repo-stream✨)
much better. ooooooooh. (log-scale on the Y-axis!)
The exponential fit is very close to our earlier single-repo sample, with a hair higher decay rate (the e^-2.89x). Average nodes per entry is slightly lower, 3.986 (💅 did i say low-count bias earlier?)
We can turn this into a Probability Density Function (PDF) by throwing that leading coefficient in the trash and replacing it with a copy of the decay rate (sounds legit right):
Probability(x) = 0.289*e^(-0.289x)
We just processed 163 million MST nodes, and observed one with 62 entries. How likely was that?
Probability(62) = 0.289e^(-0.289 * 62) = 0.00000048%
Prob of not seeing it 163M times= (1-0.0…048)^163M = 45.7%
Prob of seeing it = 1-0.457 = 54%
A bit better than 50% chance! And we did see it! Plausible!
Two-coin-flip game
I started thinking. Remember this?
To raise the entries from just a sorted list into an actual tree, some keys are assigned a layer > 0, which you can find:layer =leading_zeros(sha256(key)) / 2
— me, earlier in this post
Layer-0 Nodes are made of entries with adjacent keys at layer 0. They keep going until they get interrupted by a higher-layer key:
The first two bits of every key's sha256 are "random" and "independent", and we can consider every possible combination:
11…like 'B': 0 leading zeros / 2: layer=010…like 'E': 0 leading zeros / 2: layer=001…like 'A': 1 leading zeros / 2: layer=000…like 'D': 2+leading zeros / 2: layer > 0 (25% chance)
Let's play a game with "random" and "independent" outcomes, heads or tails. We flip two coins:
If either are heads: you win! points += 1
Else both are tails: I win + take all your points 😈
(so like, heads = 1, tails = 0, we're flipping the first two sha256 bits. order doesn't matter because [check the combinations table above again])
You're trying to get a long sequence of wins. With every pair of flips you have a 75% chance of winning, and I have a 25% chance of spoiling your run.
Time to unthinkingly spew numbers onto a sheet. Your chance of winning at least the first flip was 75%; making it to two-in-a-row 75% × 75% = 56.25%; three: 75% × 75% × 75% = 42%ish, etc.
Looking familiar (log scale alert!). Dragging it out to 62 flip-pairs:
Soothingly familiar: That -0.288 fitted exponential decay rate is just about bang on with the experimental data! Ahhhh. I'm relaxed.
Every time you write to your atproto repository, you're playing this game with the MST in your PDS. You might extend a long run of layer-0 entries for a wider and wider node, or might spoil it by inserting a key that lands at a higher layer.
I'm pretty sure this two-coin-flip model is correct as MSTs become large. The nice thing about having a model is that we can be exact about things, not at the mercy of google sheets for only three measly decimal places. Like,
Rate of exponential decay of entry sizes =
ln(75%) = -0.28768207245178092743921900599…
Statistically reasonable maximum
an attempt!
Reasonable people[who?] seem to think that UUIDv4s (random) have a low enough chance of colliding to reasonably be considered "Universally Unique".
UUIDv4s have 122 random bits, or 2^122 total possible unique values. You actually have a 50% chance of colliding at least once after just 2^61 attempts, thanks to birthdays (61 = 122/2).
So would it be reasonable to accept a 50% chance of finding a more-than-maximum-entries MST node only after seeing least 2^61 nodes?
Please someone tell me what to do if this reasoning is invalid.
There is probably some subtlety with the PDF functions of exponential decay vs whatever the birthday problem looks like, that might matter?
The chance of seeing a node at least x entries wide is 0.75^x (survival function!). We want to bring that down to 50% after 2^61 tries, or equivalently, bring its CDF up to 50% after 2^61 tries:
(1 - 0.75^x)^2^61 = 50%. solve for x:
reasonable max = 148
at a UUID standard of reasonableness
UUIDs are cool, but before getting to comfy with a 122 bits as reasonable, let's also ask cryptographers:
Use 256-bit random numbers
— Cryptographic right answers, random IDs
oop! cryptographers like BIG birthdays, 2^128! (128=256/2)
My calculator melted trying to raise exponents that big, but spreadsheets and linear regression never lets me down:
And it is linear! So we have a general solution?
reasonable max = 2.41 * birthday + 1.34
Where 2^birthday is your reasonable number of nodes for a 50% chance of having your max exceeded.
Applying cryptographers' birthday of 128 (are cryptographers reasonable people though):
crypto-reasonable max = 310
***
Impact on the atproto network
Whether you pick 148 or 310 as an MST node entry limit, you're probably fine. But remember how anyone can easily force MST nodes to be as wide as they want? ("You can just make wide nodes" above).
I'm starting to think that that's the real attack here: network compatibility.
If your app rejects MST nodes wider than 148 but my PDS accepts them up to 310, I can make my whole account invisible on your app by "mining" a node with 149 entries.
Worse, granting any app write permission (in any collection!) could be enough for that app to break your account on any other app that enforces a lower limit!
Seems like we might want to pick a number that everyone agrees is reasonable and include it as mandatory in the spec. Someone should definitely check the math, but I'm putting forward that somewhere between 148 and 310 is probably the range we should land in.
***
Update!
likes math more than me and did some thinking about it!
This sounds pretty good! And in that framing, 165 seems like a safer lower bound than 148.
But also— if you give me a couple pairs of numbers, it should be obvious what I'm about to do next:
That 0.288 from Sekoia's model is very close to my exponential model's decay rate, so I think our models align in the big picture? Tell me if what I did doesn't make sense.
Going to give her the last word here:
Also, Sekoia, if i may: