Thoughts on pubsub topic authentication

Hi,

I was just reading over the quite nicely written design document of pubsub.

That triggered me to think a bit about authentication on a topic level, that’s basically the AuthOpts in that document.

While thinking about it i’m stuck in the thought of “who manages the registration”?
Assume an implementation would set the AuthMode to key. This means that any key in the keys array is allowed to publish on that topic. But who determines which keys (peer id’s) are allowed to be in that array? Who is the “authority” of key changes?

In pubsub i can’t even figure out who the topic author is, at all. I can see who a “message” author is. It’s the message.from value which is optional. But i can’t figure out who the topic author is.

Anyhow, i’m just going to assume for this post that the first one creating a topic (what creates a topic, a subscribe or a publish?) is in effect the author of the topic. So with that assumption in place…

Currently the AuthOpts looks like this:

	message AuthOpts {
		optional AuthMode mode = 1;
		repeated bytes keys = 2;

		enum AuthMode {
			NONE = 0;
			KEY = 1;
			WOT = 2;
		}
	}

I’m afraid that this won’t scale well. As said before, the document states that the keys of those that are allowed to publish should be stored in the keys array. That likely works fine with small numbers. Say 0-100 allowed publishers. But that will become much more data if you expand it into thousands. A usecase for that would potentially be a chat place where you need to “register” to say something. Each of those registrants should then be in the keys array.

Furthermore, revoking someone’s access for whatever reason isn’t easily possible in this design. As how would you notify all your subscribers of a topic that one peer id is now removed from the list. Or added.

But i think there is a solution to solve both problems.

IPNS over PubSub!

First of all, i would change the AuthOpts slightly:

	message AuthOpts {
		optional AuthMode mode = 1;
		string ipnsKeys = 2;
		optional string ipnsKeysAuthor = 3;

		enum AuthMode {
			NONE = 0;
			KEY = 1;
			WOT = 2;
		}
	}

So i’ve added string ipnsKeys and optional string ipnsKeysAuthor.
First for the ipnsKeys. This should just be a single IPNS hash pointing to a document that contains an array of peer id’s. This means the topic message won’t grow or shrink in byte size as the peers that are allowed to publish are now abstracted away in an IPNS record.

And due to it being IPNS, it’s now also possible to revoke access. And to keep these IPNS record updates fast, the user should have IPNS over pubsub enabled.

If the ipnsKeys is of a type that includes the public key already then the optional string ipnsKeysAuthor can be omitted. Otherwise the author’s public key should be added here to verify that the ipnsKeys is created by the ipnsKeysAuthor. (side note, both cases do need an added signature, right? If yes, that would have to be added too as a required property.)

Both those values should be set at topic creating time, whenever that is.

There is still an issue though.
All of this still relies on a list of all peerids that are allowed to publish. That is a centralized concept (the list part). We can do better, can’t we? :slight_smile:

We don’t need a list! All we need to know is basically asking “are you allowed to publish?”.
For this to work there needs to be a peer that can - and is allowed to - allow someone access on a topic. Say that peer is the “topic author”. In this case a new arbitrary peer id that isn’t the author should ask the author “can i join”. This too can be done on a side pubsub channel. How i envision it is the peer author to sign the following document structure:

message TopicApproval {
	// topic to which the peer asks permission
	string topic = 1;

	// peer id of the author of the topic
	string topicAuthor = 2;

	// peer id of of peer asking to join
	string peer = 3;

	// signature of author with this peer id
	string signature = 4;
}

The peer that joins must now host that file as it’s proof to talk on that topic.
The file must then be accessible under: <peerid>/pubsub/topic/<sha256 hashed topic>/proof.
That structure or something alike. When that peer publishes something to that topic, all those that are subscribed can verify that <peerid>/pubsub/topic/<sha256 hashed topic>/proof is allowed to post. I assume this to be cashed so the traffic of these proof messages doesn’t need to be high.

A downside in this last proposal is that you lose the ability to block peers. I can think of ways to add means to fix that, but to prevent this post to become too super long i’ll just call it “to be thought of in the future”.

This discuss board seems fairly low traffic. Therefore i’m just tagging the people i could find in git and here so you’re aware the topic exists.

@whyrusleeping @raul @vyzo @stebalien @vasco-santos << For the first 3, could one of the later two mention them?)
(i couldn’t find @yusefnapora, @jamesray1, @protolambda in this discuss board)

I’m really looking forward to your thoughts!

Best regards,
Mark

Note: i had to remove links and mentions because i’m newly registered here and can only post 2 of each… -_-

I may be misunderstanding your question, but it seems like your thinking may be contaminated by the client/server world.

This would have to happen at some higher level of orchestration. The authority is left unspecified (i.e.: you need to provide your own solution). To illustrate, one approach might be for every peer to connect to a centralized authentication server and pull the latest keys. A more elaborate approach could potentially involve some sort of block-chain, for example.

It’s surprising to me that you care about who the topic author is. This again seems to indicate a slight misalignment in your mental model. By design, a topic exists as long as someone (anyone) is either publishing or subscribed to that topic. The reason for this is that p2p swarms are inherently unreliable, dynamic environments. If you consider the case in which a “topic author” were to “claim” a topic as his own, and then disappear forever, this would effectively make that topic unusable forever. The approach is instead for each peer to have a priori knowledge of which peers they may interact with on a given topic (this is reified by an arbitrary rule with respect to public keys), and to intentionally ignore any notion of who created the topic.

The idea of a topic “creator” is problematic in distributed environments because implies causality. Since there is no central clock, events can only be partially-ordered. As such, it is quite possible that a topic ends up with two creators (or more exactly, that the ordering of the earliest two peers cannot be decided). In other words, there is no guarantee that the phrase “the first one creating a topic” will make sense. At best, such behavior would be approximated with some kind of consensus algorithm, but that is beyond the scope of PubSub.

I think it would be helpful to have some context about what you’re actually trying to achieve. What high-level goal do you have in mind?

Ohh, that is quite likely the case! I try to be very open minded with this new technology, but some concepts are so alien that it’s hard to grasp.

I’d like to quote every part you said and explain it. But it might be easier for me to explain this part, which is the most essential part i think.

I envision a system where you’d have a “federated” pubsub channel. So a channel where only some defined peers are allowed to post.

Conceptually, that’s it.

You can make up a ton of usecases for this. To name a few:

  • a (private) chatroom
  • a channel where nodes can communicate with each other. For example nodes participating in a cluster for IPFS. Or think of nodes that provide a service like Ceramic.

For all those usecases there needs to be a way to “limit” who can post in that channel to those that are allowed to post in it. How i think that’s possible is with a key, a topic author, which defines one node to be in control of that channel to allow others to post in it. This concept i’m describing here is simply the best i can think of. I’d love to hear some other concept that achieves the same without pointing to one specific node.

In general, and i’m repeating myself, i’m searching for a way to limit who can post to a channel and to define who or what sets that limit.

I don’t care, really. It’s just the on;y way i can think of to make the above concept work. But again, please do tell how it could work without a need to set an author!

I’m looking forward to your reply! :slight_smile:

You can easiluy do that with a custom validator.

Assuming you have some out of band means of learning who is allowed to publish, and message signing enabled, you can filter and only accept messages from blessed authors in the validator.

– vyzo

It’s perfectly understandable! Distributed systems are unintuitive, and pure p2p systems are in some ways even more backwards.

That actually sounds like a good idea! (But this condensed version is a fine place to start :slightly_smiling_face:)

Okay! The good news is that this ought to be achievable – it will just require the usual back-asswards approach of distributed computing, and may be more or less complicated depending on exact requirements. :laughing:

First, a quick clarification. When I hear “federated”, I think of a hierarchical structure with more than one root. In other words, I expect a relatively “small” number of peers that must interact without central coordination, but I expect these “root” peers to act precisely as central coordinators for all the other peers (which are effectively clients). Websites, email, matrix, etc are all things that come to mind when I hear “federated” (although most of these are not built on top of p2p overlays, of course).

Are we on the same page?

In the client/server world, you’re probably used to modeling authority as allowing or preventing a thread of execution from doing something. In the p2p world this is often not possible, because you have no control over remote peers. The approach is therefore inverted: you model authority in terms of whether or not peers should accept a result.

At this point it becomes a bit difficult to give actionable advice because some design decisions need to be made. For example, what is the communication pattern on a given topic? Is there just one publisher, but anyone is free to subscribe? Or is it the case that several root peers must be able to publish and/or subscribe to a given topic?

In general terms, your problem is going to be one of distributed consensus. You’re going to have to find a way to get multiple root peers to agree on what kinds of messages to accept from whom. With this in mind, there are two possibilities:

  1. Root peers inherently trust each other → classical algoritms, e.g. Raft & Paxos
  2. Some root peers may be malicious → blockchain-based algorithms (but this needn’t scare you; there are simple approaches).

Hope this helps! Please feel free to provide some more details, and I’ll help as best I can :slight_smile:

this:

and:

I could be wrong, but that means one of two things.

  1. Either each receiving node would need to maintain a list of all the parties who are allowed to post. This is a workable system but doesn’t scale as you’re list gets potentially very large. Say for a chatbox with thousands of people, this already doesn’t scale well.
  2. A third party service (like a blockchain) can be used and queried to request of node X was a “blessed author”. That works too but requires an external third party which again isn’t ideal as that now became a single point of failure.

Besides that, a filter has the downside of being a per-node thing. How would you handle a change in that filter across nodes? This isn’t ideal at all. It would be much more ideal if you can ask the message sender if it were allowed to make that makes a post “hey, were you allowed to do that? No? message blocked. Yes? pass” This could for example be some kind of proof that the sender needs to provide that allows the receiver to verify the node was authorized.

Your overall impression is correct.

The one nuance I would add is that a validator rule isn’t necessarily equivalent to hard-coding the list of blessed authors. The validator can make a method call to the effect of “hey, was this allowed?” for each message.

In other words, the validator is the caller; it is the thing that implements authority by deciding whether or not to accept a message. I get the sense that you’re actually asking how the callee could work. The answer to that question depends on some of the specifics in my previous post.

For the avoidance of doubt: the per-node approach is your only alternative if you don’t want to rely on a centralized service of some sort. For our purposes, “authority is acceptance” and “per-node validation” are pretty much synonyms.

What you’re really asking about is “how can each node have a consistent view of the global state of peer permissions?”. The answer to that is either:

  1. A central authority
  2. Distributed consensus
    a. Non-Byzantine (all nodes are trustworthy)
    b. Byzantine (some nodes may be malicious)

Sound like we are on the same page!

Ah right, that would be the case when going fully distributed. And then a form of consensus would be needed. While that is cool, it’s not exactly what i’m after. I’ll just use the private chatbox example again. In a “web 2.0” world you would have to register and/or get special permission to chat in that box. In the distributed world your “peer id” if you will would have to “get permission” to allow the same thing.

I actually think that i might have answered myself in my reply to @vyzo which i just now realize when writing this reply. I said:

Would that work?
Would that allow for a mechanism where, say, 1 specific peer can authorize any number of peers. This “authorization” is only enforced by the subscribed nodes enforcing that authorization. This is basically encryption en checking signatures. Note that the “1 specific peer” could also be an IPNS hash or a wallet address, etc…

I’m a bit mixed about the callee and caller concept in this regard. I’d say that the one that posts a message is the caller and the one that listens (subscribes) is the callee. In this scenario i was looking for the one who subscribes how it can verify incoming messages. Please, let’s use “publish” and “subscribe” in this context, not “callee” and “caller”, it’s confusing enough as is :wink:

Caller/Callee is distinct from Publisher/Subscriber. I’m using the former terms in the context of a validator. My point is that the validator (which is a function) can potentially call another function (the callee), which in turn may do arbitrary things. As such, using a validator is just a hook. My point was simply that a validator can use dynamic logic, and doesn’t amount to hard-coding permissions.

Technically yes, but I’m skeptical that a naive approach is sufficient for your needs.

This is hard to answer because I don’t know if we’re talking about root peers or “client” peers. Can you give us a detailed scenario for this?

It seems like at the end of the day, you have to get every validator in every peer to agree on the (changing!) acceptance criteria for messages. If that’s the case, you have only two solutions: centralization or distributed consensus.

Ha, i think we’re actually talking about both root and client peers :slight_smile:

Let me first make clear that i have no direct need for any of this. I have ideas and plans, yes. And i try to map those ideas to a fully distributed environment setup to be rid of the dreadful single point of failures. These concepts are very likely concepts that would be very beneficial to anyone exploring ideas to go to a fully distributed setup. That could be a site, a desktop application or whatever else. Anything really.

Remember my examples given previously:

  • a (private) chatroom
  • a channel where nodes can communicate with each other. For example nodes participating in a cluster for IPFS. Or think of nodes that provide a service like Ceramic.

Those boil down to the same concepts too.
I thought those would clear up any uncertainty? Anyhow, here’s a very detailed description of something i am actually planning on making.

Some day i’d like to create a service like “amazon lambda’s” only in a fully distributed manner. The idea is that a user can send a piece of code to this distributed network of nodes. The user gets a response back with the result of the execution. Thus effectively giving the user the option to run “server side tasks” that you’d normally do on a server but can’t do in a distributed network (yet). A good example is uploading and processing photos/videos for storage, parsing combinations of documents that would be impractical on the browser side, etc, etc, etc…
How it would work is a custom piece of software runs on X (say 100) nodes.
The nodes communicate with each other over pubsub.
The nodes also accept user requests and also via pubsub.

Nodes communicating with each other
This is where this whole idea of permission becomes important.
I want each node to be able to verify that a communication from another node is a valid node participating in this network.
It’s fine if that verification relies on a hardcoded (say via IPNS to still allow changes) key.
Each node should not have a list of nodes it can communicate with. A potential list could only get outdated and cumbersome to keep in sync. A node should just “try” communication (over pubsub) and just fail if there is no response within a second or so.
Adding a new node should be as simple as deploying another node with the same configuration where each node has the exact same configuration.
This way you can build a sort of “trusted network of nodes”. But you can just as easily use this very concept for a private chatbox where a node would then be a peerid (a node is essentially a peerid too).

Accepting user requests
For this too there would be a pubsub channel. This would be a bit more elaborate as there would be a sort of API over pubsub to let a user register itself in the network of nodes. Handle payments if applicable. Cache results on IPFS.

All of the above is, again, just as easily applicable to a chatbox over pubsub in my opinion.

You could consider all these nodes to be “trusted” or “root” nodes. They should know nothing of each other but are able to communicate with each other.

Anyhow, i really hope this paints a more clear picture of the concept i try to make possible in pubsub.