JavaScript Dependency Management in OpenStack

A problem that I’ve been working on the last week has been JS dependency management – driven by npm and bower – inside of OpenStack. To be honest, this problem can be extended to JS dependency management in any application that wants to be packaged within a Linux distribution, as that is the ultimate bar that needs to be met. In this case, however, we’re just going to focus on OpenStack. And, to narrow things down even more, we’re only going to focus on front-end, bower-driven dependencies, used during runtime of the front-end.

To be clear: We are not talking about which tools to use. We are talking about making a javascript/html project’s source code both trustworthy enough for packagers, while providing access to the npm-driven toolchain preferred in this community.

Note: I anticipate updating this post to make more recommendations as I build out Ironic’s Webclient. Stay tuned.

Bower: Commit your dependencies

TL/DR: The ultimate recommendation to the OpenStack community is to use project-appropriate tools to resolve dependencies, but to ultimately commit them to source. For python, you might use something like bower.py. For NPM/Javascript projects, I personally recommend main-bower-files as demonstrated in this gulp file.

Requirement: Builds must be deterministic

Packagers’ customers are banks. Governments. Large corporations. Entities which need to ensure that the software they’re running can be signed and verified, and there are significant dollar values rolled up in SLA’s to ensure this. There’s lots of policies in place for this, some of which seem so draconic as to be beyond unreasonable. If you’re curious, I recommend reading up on PCI compliance. It all makes sense, once you realize that it’s possible to guess a password from the return speed of an error response.

In the world of packaging, this means that builds must be deterministic: If you run the build different times, the output must be exactly the same. If you can’t do that, you can’t md5 or sha1 sum the results for verification, and suddenly the packager is on the hook for the next big security breach.

Fact: Bower is not deterministic

Bower’s pretty neat. It is a registry, rather than a repository, so it only provides the address of where you can get a library, rather than providing the package itself. In the vast majority of cases, this means that bower will point you at a git repository, from which the command line client then extracts the tags as versions. This is pretty awesome, because it means that you can make github host your repository for you.

Yet…. git lets you go back in time and rewrite history. While awesome, this means that bower itself does not provide a deterministic way of resolving a dependency, and therefore cannot be used by packagers. Yes, you can cache bower and the git/svn repositories that it links to. In fact, I wrote a bower-mirror puppet module that will build a server for you that does just that. That does not solve the problem of git being non-deterministic though. As long as a library’s primary source is a git tag, you can’t trust it.

Solution: Use bower anyway

Wait, what? No, I’m serious. Fact is that bower is the de-facto dependency registry for front-end development. We should use it, because it’s an awesome tool. We should also ensure that our builds are deterministic, which means that bower should not be run as part of a build, and should only be used to assist in resolving and committing dependencies.

There is precedent: The NPM documentation itself recommends that you commit all your dependencies, a fact that came out during the SSL Debacle of 2014. Yet even without this recommendation from the JavaScript community itself, there is precedent in OpenStack via the oslo-incubator libraries. Since they are libraries in incubation, they are directly copied and committed into a target project, rather than using pip.

How do you do this? Well, that’s up to you. If you’re a mostly-python project that wants to use the bower registry but is allergic to node, then I’d suggest something like bower.py. If instead you’re using the NPM toolchain, something like the ‘update_dependencies’ target in this gulpfile should work for you.

Goodbye Launchpad, Hello Storyboard

The OpenStack Infrastructure team has successfully migrated all of the openstack-infra project bugs from LaunchPad to StoryBoard. With the exception of openstack-ci bugs tracked by elastic recheck, all bugs, tickets, and work tracked for OpenStack Infrastructure projects must now be submitted and accessed at https://storyboard.openstack.org. If you file a ticket on LaunchPad, the Infrastructure team no longer guarantees that it will be addressed. Note that only the infrastructure projects have moved, no other OpenStack projects have been migrated.

This is part of a long-term plan to migrate OpenStack from Launchpad to StoryBoard. At this point we feel that StoryBoard meets the needs of the OpenStack infrastructure team and plan to use this migration to further exercise the project while we continue its development.

As you may notice, Development on StoryBoard is ongoing, and we have not yet reached feature parity with those parts of LaunchPad which are needed for the rest of OpenStack. Contributions are always welcome, and the team may be contacted in the #storyboard or #openstack-infra channels on freenode, via the openstack-dev list using the [storyboard] subject, or via StoryBoard itself by creating a story. Feel free to report any bugs, ask any questions, or make any improvement suggestions that you come up with at: https://storyboard.openstack.org/#!/project/456

We are always looking for more contributors! If you have skill in AngularJS or Pecan, or would like to fill in some of our documentation for us, we are happy to accept patches. If your project is interested in moving to StoryBoard, please contact us directly. While we are hesitant to move new projects to storyboard at this point, we would love working with you to determine which features are needed to support you.

Relevant links:

StoryBoard Authentication and Authorization

During the OpenStack Summit in Paris this last week, we made a concerted effort to finally migrate the openstack-infra projects over to StoryBoard. This is a pretty big milestone for us, because it’s the first real set of users that we’ve had on our system – basically our beta users. Of course the best laid plans ran into some problems, one of which is forcing us to make a decision on how to handle user identity. What follows is my personal opinion on where we are, where I’d like to see us go, and what I feel would take us to get there.

Problem Summary

Our original data source (pre-migration) permits permits duplicate user names, which results in “duplicate” user records. In some cases this is intentional, as users wish to retain supplemental identifiers (such as IRC nicks) on their user accounts, while being able to clearly separate contributions made as an agent of an organization (ex: employer), from contributions made on their own behalf.

StoryBoard, in contrast, does not permit duplicate user names, which has raised the question on what to do during data import. Should we permit duplicate user names, and risk creating zombie users? Should we prompt the admin during import to decide whether to create a user or link a user? In that case, what do we do about the extra OpenID, do we permit two different users to log in as one user?

This, and many other edge cases, makes this one of those hairy problems legendary for causing technical debt, so rather than trying to patch the problem right now, it behooves us to consider where we want to end up, and take the minimum number of steps towards that goal which also solves our immediate problem.

The Long-Term Goal

My long term goal with StoryBoard is twofold: Firstly, I want our authentication system to be n-pluggable, so that an install can permit their users to authenticate against multiple Authentication providers. This is the “Log in with Facebook/Google/OpenID” story, and while I anticipate that the vast majority of users will only use one single login method, the ability to link multiple providers is critical in complex organizational structures as well as handling legacy auth migration cases.

Secondly, I want users to have the power to declare their own identity within the system, with as minimal fuss as possible. The best way to describe this is as Google Auth’s multiple login, where a user may switch their identity within the same browser session.

Current Design

The current design of StoryBoard’s authentication contains two portions: Authentication and Authorization. The first, Authentication, is the to-be-pluggable system I referred to earlier, whereby a user’s identification is delegated to a trusted third party. Once we have received a user’s identity from the remote system, we attempt to resolve that user against our local database, creating a new record if necessary.

The second part, Authorization, mirrors the standard OAuth/OpenID flow of establishing a trust relationship with the browser (to the best of our ability, anyway) and issuing the user an API Bearer token they can use to make queries with.

The entire system right now is hard-coded to only accept a single OpenID provider, and our design has only been tested against Launchpad/Ubuntu One.

Identifying Delta

To get from ‘Where we are’ to ‘Where we want to be’, we then need to identify the necessary delta to our code, data, and functionality.

  • To enable a user to link to multiple authentication providers, we must be able to store N remote ID tokens (such as an OpenID) per user.
  • To enable multiple authentication plugins, we must use a stevedore-like plugin discovery mechanism to allow multiple plugins to be loaded. Similar code is already in place, and should be relatively easy to copy.
  • To enable a user to choose how they wish to authenticate, we must permit some form of authentication discovery, where the API informs the user what authentication options they may have.
  • In order to permit user discovery by multiple different parameters (email, irc handle, etc), we must permit a user record to reference multiple email addresses and multiple IRC handles, which MAY conflict.
  • To permit multi-session in the browser, the web client should store multiple user auth tokens, and permit a user to ‘switch’ between which one is being used.
  • In order to display the name of the actor in the UI, providing a consistent display name is necessary. While we can update this display name by whatever comes back from our remote auth providers, may cause caching problem when a user’s display name changes from system to system.
  • In order to permit the normalization of user records and allow zombie removal, we must permit a user merge activity that can support two use cases: Firstly, in the case that a user retains the ability to identify as both users to be merged, a self-serve merge process by which identity control is verified and resolved. Secondly, in the case where a user record has become truly orphaned from an authentication system, an administration action that permits a brute-force user merge.

With the above “eventual” features in mind, it quickly becomes clear that the users table in storyboard currently contains too much data. Given a 1-to-n mapping on email addresses, user names, OpenID’s, and display names, removing these into their own tables reduces our user table to little more than an ID and a login timestamp.

Back to the problem

With the above in mind, it becomes clear that there is no real benefit to maintaining a uniqueness constraint on the username column, as it provides no real useful data. IRC handles – one of the goto identifiers in OpenStack, do not cleanly map 1-to-1 with actual user records, as a particular person might be acting for different agents. Thus it is actually a benefit for us to permit duplicate usernames.

By lifting the uniqueness constraint, we both fix our immediate problem, and take a step in the correct direction for our optimal system.