OIDC Federation on a Grand Scale

Content Warning: Identity Nerd, snapshot 2023.

The OIDC Federation Specification (draft) has been stuck in my brain for about a year now because it changes how I think about my job. On its surface, the concept is relatively simple: Provided that there is a federation of actors, it provides a method by which these actors can establish a common trusted root actor. A simple example might be that I claim that I’m Michael, but you have no reason to trust me. Still, we can both agree to trust the IRS (for better or worse) about that claim.

In true specification style, that’s where it ends. It doesn’t talk about what you do with that common trust, nor does it talk about how a federation is established. A conversation with Mike Jones at IIW Nov ’22 revealed that the intended use case was Academia; that the legal contracts established between universities to share research resources can be converted into a federation, and that this federation can then grant one professor access to, say, a time on a device halfway around the world.

Federation in a Data Center

At VMware, this problem becomes really, really interesting. Any piece of metal can host multiple different leaf nodes, and different authorization paths. An actor such as myself needs to access the metal itself, the datacenter’s management plane, a running VM, or even a workload running on that VM. Each of these leaf nodes will have its own authorization requirements, ranging from 2FA to demonstrable physical presence, and having to sign into multiple different systems to complete your work can be cumbersome.

As you can see, this can very rapidly expand into thousands or hundreds of thousands of different nodes in this federation. The Federation specification provides a way by which each of those nodes can easily join that federation.

Gotcha’s

A magical world where every piece of metal that is slotted into a rack can automatically configure itself and join a federation is likely still the province of private implementations. Even so, this and related specifications provide a zero-configuration path forward, as it leans heavily on other automatic configuration RFCs that allow each node in the federation to register itself. Using Dynamic Client Registration a new node can attach itself to the federation, presuming that its authorization server trusts it.

Perhaps more importantly, there is too much vagueness in other specs like the Token Exchange RFC, so we can’t yet answer the question: “If I’m already authorized with one leaf node, how do I convert that to authorization to another leaf node?”. It’s unlikely that a leaf node knows the exact nuances of the OIDC implementations (or versions) active in the federation.

However, the critical core of the Federation specification is there: Any given node does not have to know about all the other nodes in the federation; it just has to know about its peers, and it can discover everything else from there.

EV’s, and the return of the Great American Road Trip

My family and I took a well-earned road trip back in the spring; two days in Portland, after which we picked up the kids and took them up to the Olympic Peninsula. We drove our EV, bought as a splurge after an unfortunate accident in February resulted in a car-shopping scramble. The trip itself was wonderful, yet more importantly, it made me really think about American Car Culture, and my own history.

If you’ve watched the Pixar movie ‘Cars’, it builds strong nostalgia for how people “used to” travel in cars. They’d drive, stop, look around, spend some money, then keep going. As a kid, I caught the tail end of that culture. Every summer we would pile into our excessively large Buick Electra, and drive to all kinds of places: Texas, New Mexico, Arizona, Minnesota, Louisiana, Utah, hitting ever major park along the way. This is likely because my parents had three kids, and regularly kicking us out of the car to explore was a way to maintain a small measure of sanity. To me, however, I got to see the changing countryside and experience it … more slowly, and more thoroughly.

As I moved to college and bought my own car, I had lost that sense of wonder. It was more about getting to a location as quickly as possible, sticking to highways and bypassing every town along the way. I recall one roadtrip from Pittsburgh to Texas that I did in 3 days of unbroken driving, stopping only for gas and sleep.

As we attempted something similar with our EV, I could’t help noticing that the need to charge the car forced us back to a more leisurely pace. We would drive for about 2 hours, then be forced to take at least a 30 minute break before moving on. Gone were the days where we could gas up and go, putting 10 hours of straight driving on the clock to cover some real distance. Instead, we stopped wherever we could find a charger, strongly preferring locations that had something to do, something to see, or both.

This was tricky; finding that magic intersection of “Car Charger” and “Walkable Family-Friendly Location” takes some real effort. We found some gems: Finn River Cidery, for example, has two chargers as well as a pizza oven, playground, and live music on weekends. Downtown Port Angeles has a few chargers and a whole lot of pedestrian charm. The road from Seattle to Portland, however, found us at a run-down, mostly empty mall.

Conventional economic wisdom, of course, means that this limited practical range of EV’s marks them as ineffective road-tripping cars. Thus, chargers are more likely to be seen at local locations, such as shopping malls and grocery stores. And, speaking for myself, I have no desire to roadtrip if I have to stare at the outside of a Walmart every two hours.

And yet, EV’s are becoming the norm; there’s no way to avoid this. As the charging network expands across the United States, I see real opportunity for smaller communities and national parks to recapture some of that travel nostalgia. Put a few chargers near a quaint downtown, and you’ll create a magnet for EV drivers. Allow charging at your national park, you’ll get more visitors who will take a break to explore, before moving on. 

I’m ready to be one of them.

How to write a Recommendation

Here’s a quick, three-paragraph template for writing a letter of recommendation. Don’t overthink it; these should be quick to read, and leave a strong positive feeling on behalf of the candidate.

Paragraph 1: What are you doing, and why?

The opening is a simple statement of what you are doing: “I am recommending [candidate] for [promotion/acceptance] to [level/program].” This is then followed by your reasoning; choose one or two qualities that represent the program they are applying for, and give an example about how they express this quality.

You don’t need to go into explicit detail; remember that this letter is part of a larger application, and your goal is simply to add weight to the overall packet.

Paragraph 2: How does the organization benefit?

Next, detail one or two clear benefits to the program for accepting this candidate. Unambiguously tie success directly to this applicant; the goal is to create organizational incentive for accepting the candidate.

Be direct; unambiguously answer the question: “What do we gain if we accept this candidate”, and make it something valuable.

Paragraph 3: How does the candidate benefit?

The last paragraph should be about the candidate, and how / why they will benefit from being accepted. Explain how they are motivated and engaged for personal reasons, and will be an active participant in the program. Nobody likes a freeloader.

Done? Share it with the candidate!

Unless there are real reasons to keep your identity private from the candidate, share your recommendation with them. Everyone enjoys recognition, and it will only serve to strengthen your relationship with them in the long run.

Free (as in Tier) OAuth2

Are services like Auth0 or Okta really worth paying for? For a business, perhaps; the overhead of paying for an auth-focused software engineer, as well as the operational overhead of monitoring, could very well be more expensive than handing over a credit card. However, if you have made the decision to host your own, it turns out you can do so on AWS’s Free tier, with only a few strategic technical choices.

Counting the API requests

The reason this is so cheap, is that authorization – especially considering the advent of signed tokens – only requires a few API requests. Let’s count the number of API requests required for a typical user session of 30 minutes:

  1. One call to /.well-known/openid-configuration to read the oauth2 configuration.
  2. One call to /jwks.json to read the signing key set.
  3. One call to the /oauth/authorization endpoint to initialize the authorization flow.
  4. One call to a /oauth/callback endpoint, usually triggered by a third party IDP (sign-in via google)
  5. One call to the /oauth/token endpoint, to issue the first token.
  6. One additional call to the token endpoint as long as the user session persists, to refresh tokens.

As you can see, initializing a session takes 5 calls, while maintaining a session takes one call for however long your Access Token TTL is set. I personally prefer mine at 5 minutes, so let’s work with that. All together, assuming an average 30 minute session working in your app, is 10 API calls. Note that machine clients only need 3 API calls to initialize a session, as the callback is not needed.

The AWS Free Tier

The “cheapest” architecture to use in AWS to publish an API is a combination of the API Gateway, Lambda, and DynamoDB. Speaking practically, the true cost limit of this architecture are the gateway and lambda limits, which in both cases are set to 1,000,000 (one million requests per month).

Putting it together, assume 10 API calls per user, and 30 days in a month, results in ~3333 active, daily users. Increasing the token duration to 10 minutes increases that to 4761. Each additional batch of users will then cost you ~$1.20 per month, which grows linearly until you hit usage tiers.

When to switch from Lambda?

Eventually, you’ll build enough traffic that you may consider buying something, or hosting an EC2 instance and/or container. 1,000,000 requests per month result in about 23 requests per minute, which most servers can handle swimmingly. Furthermore, a small AWS EC2 reserved instance will cost you about $23/month (speaking from how I’m hosting this blog), so the cost/benefit tradeoff is 23/1.20 + 1, so 20x times what lambda would handle. In short, 60K daily active users and a total request rate of 430 requests per minute, or 7.1 requests per second.

But what does it cost to build one?

There are many open source OAuth2 servers out there, and if you choose to go that route you might as well pay for the “exorbitant” $23/month fee from AWS.

However, speaking from experience, a simple STS service which does not have its own login, performing authorization against a third party like Google or Facebook, takes about a week to build and deploy. The complexity comes from building a tenancy model that allows different organizations to sign up and have an isolated permission management API; so if you’re creating a B2C service, it’s relatively quick. B2B is more complex, however in this arena chances are the off-the-shelf solutions out there won’t exactly match your business needs.

Running a Successful Bug Bash

A bug bash, much like any coordinated effort, requires planning. Here’s a guide on how to get the most out of everyone’s time.

Preparation

Step 1: Collect your use cases

This is where you scour all the documentation that has been generated – via wiki, tickets, discussions, meetings, designs, etc – and capture the use cases that your feature needs to satisfy. In a perfect world, this is a collation effort; in an imperfect world, you’ll be writing them by exploring your application.

Step 2: Write Scripts

Every use case above needs to be distilled into a simple, easy to follow script for an uneducated user from a different team to follow. Assume that they have zero context, and that they only read english (and don’t understand nerd). Each script should complete a single task, and should take no longer than 10 minutes to complete. This time constraint gives each participant space to explore your feature and discuss what they’re seeing.

The purpose of the scripts is twofold: First, it focuses the attention of attendees. Secondly, it removes discussion about ancillary topics such as “how do I log in”, and generally makes everyone productive.

Step 3: Prepare meeting resources

You will need the following resources before the bug bash.

  • A home, or identification schema, for tickets created during the bug bash. These will come hard and fast, so make sure you either have a parent ticket, or a tagging scheme, ready beforehand. It can be as simple as the “bug bash” tag.
  • A backchannel. Whether a chat channel, a video conference, or smoke signals, you will need a place for participants to discuss what they’re encountering.
  • A Ticket format, which participants will use to report issues discovered in the bug bash. Each should include:
    1. The name of the script
    2. The step in which the issue was encountered
    3. The name of the person who found it if other than the reporter
    4. Whether the reporter thinks it’s a release requirement or not
    5. Expected behavior
    6. Actual behavior

Step 4: Prepare testing resources

In order to satisfy the constraints of each script, you need to create the input resources necessary to complete that script. This might include:

  • User accounts with specific access rights.
  • Images to upload into the system.
  • API Keys to access the system with
  • Postman collections by which the system can be tested
  • Prepopulated integrations (if your product posts to slack, for instance)

Each of the above should be directly added to the relevant script.

Step 4: Appoint a moderator

The moderator’s job is to answer functional questions while the participants are following their scripts, and to keep the conversation from going into the weeds and distracting other participants. Before the bug bash, make sure that this individual is deeply familiar with the feature.

You may also want to appoint a scribe, who’s job is not to file all the tickets, but to have a deep understanding of outstanding work and thus intercept duplicate issues in real time.

During the Bug Bash

  1. Start promptly. End promptly.
  2. Demonstrate how to report an issue using the above scheme. Remind everyone that you can’t read their minds, so please document each issue extensively.
  3. The strategy for a bug bash is “report as many issues as possible”, duplicates will be sorted out later.
  4. Assign scripts. Every participant should get 1 script for every 30 minutes of time scheduled, and try to get each script assigned to at least two participants. If a participant finds they have extra time, they are welcome to explore.
  5. Go find your bugs!

After the Bug Bash

Immediately after the bug bash (to avoid forgetting details), the moderator and any scribe should triage all the issues that were reported. Each issue should be evaluated for:

  • An accurate, reproducible description of the issue.
  • Duplication (best description wins)
  • Priority

Offer prizes!

To really encourage the filing of many issues, you can offer prizes for the most non-duplicate issues reported!

The case for Edge on OSX & Linux

Ever since Microsoft Edge was leaked for OSX, I’ve used it. Not exclusively, however as a UI Architect it’s both my job and my hobby to keep an eye on the landscape. I’ve certainly gotten a lot of flack for it – both from friends and coworkers – however when presented with my reasoning, they all agree that it makes sense (though it’s often not for them).

The reasoning is simple: I can keep my work browsing separate from my personal browsing. It all comes down to this:

That’s right, sync. Edge uses Office365’s IdP and, especially in large corporations, that usually means you get a seamless sync experience sandboxed to your work. Additionally, Edge’s integration with Office365 is surprisingly useful, revealing my direct colleagues’ docs activities so I can keep up to date on what’s moving on my team.

It’s certainly not perfect – operating systems still only permit only one default browser, so opening links can sometimes be a bear as I copy-paste furiously between Chromium and Edge (I’m looking at you, Slack). Lastly, it’s certainly not useful if you’re working in a Google Apps environment.

Furthermore, as someone who loves his unit and functional testing, having a karma launcher that works on OSX would be lovely; not to mention a selenium-enabled docker image. Yet even without those, it’s already crossed my own Threshold of Usefulness™️.

But wait, what about Firefox? Well, Firefox now stands alone in the browser wars as the only Not-Chromium browser with major market share. As someone who works daily on user interfaces, I will naturally default to the experience that most of my customers do.

Angular > React

I’ve been overseeing UI projects for … oh, decades now. I’ve shipped production applications in backbone.js, SproutCore, Angular 1, Angular 2, React/Redux, even going back to the old days of the Adobe Flex ecosystem with Cairngorm, PureMVC, and Parsley. I’m old, I’m grouchy, I’ve been around the block a few times, I’ve made all the mistakes.

I still think Angular is better than React.

Now, don’t get me wrong – I understand the appeal of React, and why Angular’s a tough sell (even though they can work together rather well). React’s learning curve is lower, it’s not a full framework, you can add React to an already existing application easily, while Angular (to be effective) requires a full rewrite. It’s very easy to get started with React, while Angular has a famously steep learning curve. React doesn’t force much on you, Angular… really does. As a result of this, most ‘comparisons’ I’ve found out there are poorly informed, and rarely overcome the resentment of “why are you making me use ThingA rather than ThingB I’m used to?”

And, lastly, React is the “sexy JS library du jour”, hand in hand with Vue.js. That counts for a lot, even though this is the Bandwagon Fallacy.

The React Snowflake

The first reason I dislike React, is because it encourages snowflake projects; projects that are different enough in their other technical decisions as to be effectively unique. This is often put forward as one of the “nice” things about React, however I consider it its worst shortcoming.

To illustrate, let’s list out a few of the other decisions that go into a project:

  • Language: Typescript, Javascript, other.
  • Build system: Webpack, Gulp, Grunt, Bazel, Babel?
  • Data layer: RxJS, Redux, MobX, too many to list.
  • In-app navigation: HTML5, #! based, React-navigation, etc.
  • Linting rules: Single-line vs multi-line variables, 2 vs. 4 spaces, etc.

If the above seems mostly irrelevant from your point of view, then I hazard to guess that you are working on a small team, where the learning curve and arguments around these technical decisions is lost in the churn of on-boarding. Either that, or your organization has put a lot of work into enforcing consistency.

“There should be one– and preferably only one –obvious way to do it.”

Zen of Python (PEP-20)

In the Angular ecosystem, most of the above decisions have been made for you. As a result, if you switch between Angular applications, it’s very likely that are that you’ve seen it all before. While on the outside this seems authoritarian, in reality it is highly liberating – you are free to focus on interesting technical problems, instead of fighting the tabs-vs-space battle for the upteenth time in your career.

In short: React encourages divergence, while Angular encourages consistency.

Feature Reuse (it’s more than just code)

If your company is successful, it will grow. This usually means additional projects, such as admin dashboards, white labels, or even entirely new products. The way this usually happens is with a copy/paste of the existing project, deleting all the bits that aren’t needed for the new application, at which point the implementation details begin to fragment. This especially becomes a headache when trying to maintain a consistent user experience, as one button may evolve away from the other. 

This is where Angular really shines, though I’ll have to dig into technical details a little to make my point. In short, Angular is actually an Inversion of Control engine. You build a component or service, and then provide it to the Injector to manage. This can be done individually, or groups of components can be bundled together in a ‘Module‘. These modules are powerful – as an example, you could build a UserProfileModule that, when injected, takes over the /user/profile path in any application that it’s provided to (it can even be lazy-loaded). Other modules can provide ambient features, such as the popular Angulartics2 metrics library.

As you can see, this is much more powerful concept than simple ‘code reuse’, as entire features can be not only shipped, but trivially imported. It’s one of the advantages of the heavy guard-rails that come with Angular – if everyone agrees on how to build lego’s, then it’s easier to fit them together.

So now, let’s return to the above problem. You’re creating a brand new application, and rather than copy/pasting from existing codebases, you add a dependency on an angular library that provides you with modules that you want to use. If the library updates, you do too.

A more concrete example of this is VMware Clarity. One of the reasons we’ve been so successful at maintaining a consistent UI across all of VMware’s SaaS Products is because our UI components come in one easy-to-use module.

Technical choices are business choices

Are you in a startup, planning to be acquired? Are you in a large organization, with multiple projects that need to maintain a common look & feel? Are you about to face a major re-org? Is one product being moved to maintenance while another is being built?

These kinds of changes are common, as our industry is very dynamic. And speaking as someone who’s been involved in Technical Due Diligence, I’ve seen acquisition deals fall through because technical decisions made by the prospective organizations would be too difficult to reconcile.

In the end, every technical decision we make is a strategic one – which is one of the reasons I’m a huge fan of the Choose Boring Technology philosophy. With that in mind, recall that an Angular application can use React components, while a React application cannot easily use an Angular application. In other words, it’s easier for you to adopt another organization’s code if you start with Angular, even if they’re React-based.

Encrypting sensitive variables in terraform using GnuPG agent

This post will walk you through how to encrypt sensitive terraform variables in a way that still permits them to be committed to VCS, while also being reasonably easy to decrypt. Examples use bash, however are easily adapted to other environments. Special thanks to this post on encrypting the ansible vault password, as my examples draw heavily from that source.

This method is particularly awesome, because you can explicitly declare who is permitted to decrypt it. So, for instance, all of the engineers on your team could be unable to access its contents, while your CI/CD system (jenkins or whatnot) can use their own gpg identity to decrypt the data.

Step 1: Create a json file with your sensitive variables

echo "{ \"github_api_key\": \"secret\" }" >> variables.json

Step 2: Encrypt it using gpg

gpg --encrypt ./variables.json 

The above call will ask you to add a list of recipients. In general it’s a good idea to add yourself, so you can decrypt the file at a later point. You can also add other users, such as the aforementioned CI/CD engine, or specific members of your team that you trust.

Step 3: Create an external provider in terraform

Terraform permits you to read data from external programs, as long as they return JSON. This is easily done:

data "external" "gpg" {
  program  = [ "gpg", "--batch", "--use-agent", "--decrypt", "./variables.json.gpg"]
}

provider "github" {
  token    = "${data.external.gpg.result.github_api_key}"
}

Now, assuming that you have a running gpg-agent, you’ll only have to enter your gpg passkey as your own security policy requires.

The only major caveat with the above, is that you cannot use these variables to maintain the terraform state.