During the OpenStack Summit in Paris this last week, we made a concerted effort to finally migrate the openstack-infra projects over to StoryBoard. This is a pretty big milestone for us, because it’s the first real set of users that we’ve had on our system – basically our beta users. Of course the best laid plans ran into some problems, one of which is forcing us to make a decision on how to handle user identity. What follows is my personal opinion on where we are, where I’d like to see us go, and what I feel would take us to get there.
Our original data source (pre-migration) permits permits duplicate user names, which results in “duplicate” user records. In some cases this is intentional, as users wish to retain supplemental identifiers (such as IRC nicks) on their user accounts, while being able to clearly separate contributions made as an agent of an organization (ex: employer), from contributions made on their own behalf.
StoryBoard, in contrast, does not permit duplicate user names, which has raised the question on what to do during data import. Should we permit duplicate user names, and risk creating zombie users? Should we prompt the admin during import to decide whether to create a user or link a user? In that case, what do we do about the extra OpenID, do we permit two different users to log in as one user?
This, and many other edge cases, makes this one of those hairy problems legendary for causing technical debt, so rather than trying to patch the problem right now, it behooves us to consider where we want to end up, and take the minimum number of steps towards that goal which also solves our immediate problem.
The Long-Term Goal
My long term goal with StoryBoard is twofold: Firstly, I want our authentication system to be n-pluggable, so that an install can permit their users to authenticate against multiple Authentication providers. This is the “Log in with Facebook/Google/OpenID” story, and while I anticipate that the vast majority of users will only use one single login method, the ability to link multiple providers is critical in complex organizational structures as well as handling legacy auth migration cases.
Secondly, I want users to have the power to declare their own identity within the system, with as minimal fuss as possible. The best way to describe this is as Google Auth’s multiple login, where a user may switch their identity within the same browser session.
The current design of StoryBoard’s authentication contains two portions: Authentication and Authorization. The first, Authentication, is the to-be-pluggable system I referred to earlier, whereby a user’s identification is delegated to a trusted third party. Once we have received a user’s identity from the remote system, we attempt to resolve that user against our local database, creating a new record if necessary.
The second part, Authorization, mirrors the standard OAuth/OpenID flow of establishing a trust relationship with the browser (to the best of our ability, anyway) and issuing the user an API Bearer token they can use to make queries with.
The entire system right now is hard-coded to only accept a single OpenID provider, and our design has only been tested against Launchpad/Ubuntu One.
To get from ‘Where we are’ to ‘Where we want to be’, we then need to identify the necessary delta to our code, data, and functionality.
- To enable a user to link to multiple authentication providers, we must be able to store N remote ID tokens (such as an OpenID) per user.
- To enable multiple authentication plugins, we must use a stevedore-like plugin discovery mechanism to allow multiple plugins to be loaded. Similar code is already in place, and should be relatively easy to copy.
- To enable a user to choose how they wish to authenticate, we must permit some form of authentication discovery, where the API informs the user what authentication options they may have.
- In order to permit user discovery by multiple different parameters (email, irc handle, etc), we must permit a user record to reference multiple email addresses and multiple IRC handles, which MAY conflict.
- To permit multi-session in the browser, the web client should store multiple user auth tokens, and permit a user to ‘switch’ between which one is being used.
- In order to display the name of the actor in the UI, providing a consistent display name is necessary. While we can update this display name by whatever comes back from our remote auth providers, may cause caching problem when a user’s display name changes from system to system.
- In order to permit the normalization of user records and allow zombie removal, we must permit a user merge activity that can support two use cases: Firstly, in the case that a user retains the ability to identify as both users to be merged, a self-serve merge process by which identity control is verified and resolved. Secondly, in the case where a user record has become truly orphaned from an authentication system, an administration action that permits a brute-force user merge.
With the above “eventual” features in mind, it quickly becomes clear that the users table in storyboard currently contains too much data. Given a 1-to-n mapping on email addresses, user names, OpenID’s, and display names, removing these into their own tables reduces our user table to little more than an ID and a login timestamp.
Back to the problem
With the above in mind, it becomes clear that there is no real benefit to maintaining a uniqueness constraint on the username column, as it provides no real useful data. IRC handles – one of the goto identifiers in OpenStack, do not cleanly map 1-to-1 with actual user records, as a particular person might be acting for different agents. Thus it is actually a benefit for us to permit duplicate usernames.
By lifting the uniqueness constraint, we both fix our immediate problem, and take a step in the correct direction for our optimal system.