ScudDB: a distributed database in Rust using libp2p for its metadata sharing

For my masters thesis I’ve created a distributed database, which aims to provide strongly consistent data storage for some edge compute applications. It achieves this by giving specific nodes authority over a certain prefix. These nodes can then service reads, writes and create new sub-prefixes without having to communicate with any other nodes. For example, one node might have authority over “/”, another over “/students/” and another over “/professors/”. These nodes oversee the creation of new sub-prefixes on other nodes, such as “/students/Bob/” and “Professors/John/” respectively.

This model also has attractive failure tolerance properties since there’s no centralized entity responsible for prefix distribution.

Clients can query the entire database through a single database node because requests are forwarded to the appropriate node. However, for request forwarding, traveling up the entire prefix hierarchy and back down to the correct node is slow.
To solve this problem, I used libp2p’s gossipsub to spread metadata about which prefixes are stored where. Compared to using a classic pubsub system for this purpose, libp2p lets me simplify the architecture of the entire system, without introducing another point of failure.

The (undocumented) implementation can be found here.

The libp2p part responsible for metadata sharing, utilizing gossipsub with kademlia and Identify for peer discovery can be found here: server/src/metacom/mod.rs · master · Frederik Baetens / scudDB · GitLab

I think this is an interesting use case, because unlike most(?) users of libp2p, no byzantine fault tolerance is required, yet I still benefit from its failure tolerance and relative simplicity.

If anyone has some questions, suggestions, or knows of others building similar systems, I’d love to hear about it! So far I know about Cloudflare workers KV & Durable Objects and some academic databases.

1 Like

Hi Frederik,

Thanks for sharing your project! Cool to see this use-case of libp2p.

I would be especially interested in the performance characteristics of this architecture. Did you do some benchmarking or simulations?

If anyone has some questions, suggestions, or knows of others building similar systems, I’d love to hear about it! So far I know about Cloudflare workers KV & Durable Objects and some academic databases.

@yiannisbot and Dietrich have a great overview over the space. What do the two of you think?

Another DB project that comes to my mind, building on top of IPFS and libp2p, is https://github.com/orbitdb.

1 Like

Side note, cool name!

ScudDB is named after a low to the ground cloud.

1 Like

So the forwarding itself, once the metadata is present, is quite fast. I maintain a lazy connection pool (only keep connections after they’re needed), so forwarding a request adds just a single round trip per hop (usually just one b/c the metadata is usually always present). Since the forwarding usually just involves the node the client is connected to, and the node with data ownership, there’s no central node or cluster to bottleneck request forwarding.

I haven’t benchmarked the prefix creation and associated metadata dissemination though. Do you know what kind of throughput one can expect from libp2p gossipsub & what variables affect it?

I’ll try to run a quick prefix creation benchmark on a few vms, but if there’s potential gossipsub bottlenecks caused by large network sizes, I don’t really know how to test that. How do you run libp2p tests when testing 50+ nodes? Or do you just not run those kinds of tests? Do you use IAC tools like pulumi or something like ansible?

You can use testground The sdk in rust is still WIP.

You can run simulations. with my computer I can run 40+ instances.

1 Like

Hi,
Very interesting project.

I have some questions (I am new to p2p so sorry if my questions are dumb):

Is it possible to read your thesis somewhere?

How nodes have authority over a prefix ? How do you prevent two nodes to have the same prefix, for example, /students ?

If I am a node and create the sub-prefix /students/Bob the student bob data will only live in one node ? What if the node with bob data goes offline, is it possible to get the data ?

If I want to get all /students from the database is there a way to know if I got all students or just part of the students, for example let’s say there are 10 students (assuming that database is on a strongly consistent state for the 10 students), but a node got down, and I only got 7, is there a way to know that my result is incomplete ? If yes, how did you solve it?

Can you give some use case of the database, how would this be used in a real university, for example?

Thanks.

Is it possible to read your thesis somewhere?

If you speak dutch :). Unfortunately I am somewhat obligated to write my thesis in Dutch b/c of language policies at my university.

How nodes have authority over a prefix ? How do you prevent two nodes to have the same prefix, for example, /students ?

Strong consistency and atomicity in prefix creation is enforced by only allowing the node with authority over a parent prefix to give out authority over new sub-prefixes.

For example: let’s say node A has authority over the prefix “/”, and node B and C both simultaneously want to create the new prefix “/students”. Both B and C will send a request to node A, to request permission to create the new prefix “/students”. Node A will handle these requests as atomic transactions, meaning one request will get handled strictly before the other. Let’s say B’s request is handled first, meaning that B now gets permission to create the “/students” prefix, and C’s request will be denied.

In the end, only B will have authority over the “/students” prefix, and C will not have been able to acquire authority over the “/students” prefix.

If I want to get all /students from the database is there a way to know if I got all students or just part of the students

Yes, this is possible. As explained in the previous section, a parent prefix needs to be informed about the creation of all sub-prefixes. This means that the node with authority over the “/students” prefix will have knowledge of all sub prefixes of “/students”, so it will be able to tell you about all students.

If a node with authority over one of the students is down, you will get an error because that node would be unreachable, so you would know that you are missing some students.

Can you give some use case of the database, how would this be used in a real university, for example?

I don’t know if it’s useful to use this kind of database for a university, because universities usually aren’t spread all over the world, meaning that they wouldn’t really benefit from distributing their data globally, like ScudDB allows for.

A simple use case is a chatroom application, where people mainly talk to other people in the same region, meaning that you could locate the chatroom and its messages in the same region as the majority of the users in that chatroom.

I hope that clears up your questions. If anything is still unclear, feel free to ask :slight_smile:

Hi,

If you speak dutch :). Unfortunately I am somewhat obligated to write my thesis in Dutch b/c of language policies at my university.

I can read pictures lol, yeah I can’t read Dutch…

Thank you for your answers, very interesting concept, one of the use cases that comes to my mind would be a discussion board like this one.