Skip to main content

Package Registry

· 9 min read
Evan Yamanishi
Director of Engineering

Let's move all our internal/private Node.js packages to a GitLab-hosted package registry, which we'll then use to publish and host any type of package that GitLab supports going forward.

Motivation

When we first started publishing Node.js packages to a registry, we weren't on GitLab so our only option was self-hosting. That resulted in solutions like npm.wwnorton.net (hosted on AWS) and the Norton Lab registry (hosted on GCP). Now that we are on GitLab, we have the option to use GitLab's managed solution.

Beyond that timely opportunity, there are three additional key motivators that I'll outline in the following sections.

  1. Why we need a package registry in the first place.
  2. Why we need a private package registry.
  3. Why we need a new solution (what's wrong with our current approach).

Why Do We Need a Package Registry?

Package registries are a server where developers can publish "packages"—code that is meant to be used by other developers. For instance, Node.js package registries give us a space to publish JavaScript code so that other JavaScript projects can install that code and use it. Other examples of modern package registries include JSR for JavaScript & TypeScript, crates.io for Rust packages ("crates"), and registry.terraform.io for Terraform packages ("modules").

In the absence of a package registry, developers might turn to solutions that lead to tighter coupling like code vendoring or build-time transclusion (custom logic to pull in external code during build).

Package registries enable loose coupling through a few key features:

  1. They are pull-based, meaning consumers must opt into using a package.
    Most package managers use some form of an install command to add a package, putting the onus on the consumer to pull the new package. For instance, npm uses npm install.
  2. They can enforce immutability, meaning it's impossible to change a published version once published. This prevents a range of issues including supply chain attacks (changing a published version to add malicious code), non-reproducible builds (changing a published version can result in different code ending up in subsequent builds of identical code), and general change breakage (changing a published version could break functionality for consumers).
    • I say "can enforce" instead of just "enforce" because it's not a guarantee. For instance, NPM initially allowed mutable tags, but they've since enforced immutable tags.
  3. They encourage semantic versioning, which provides a clear and standard consumer/provider contract. While not always required or enforced, package registries strongly encourage the use of semantic versioning, which gives consumers of packages a programmatic way of determining the nature of changes. For instance, NPM started enforcing semver for the latest tag in v11.

Why a Private Registry?

Package registries are great but they're typically designed for open source software, meaning anything we publish to them is going to be publicly available. This works for open source projects like the Norton Design System, but is a deal breaker for any other code that we want to share amongst teams but not with the public.

Most public registries use a SaaS model where they allow organizations to publish private packages for a fee (the public NPM registry uses this model, for instance), so one option available to us would be to create a paid organization with such a service. More on this in alternatives.

Why Change?

ENG-68 covers our current state in detail, but the summary is that we currently have multiple registries for different teams, each of which is under-maintained. Here's an overview of the registries that I know about right now:

  1. https://npm.wworton.net is what most teams use today. It's a Verdaccio instance in AWS that we "maintain" (it's not really maintained by any of my measures). It's behind a private AWS VPC, which means you have to be connected to the VPN to access it.
  2. Pub Tools publishes packages to a Verdaccio registry hosted in GCP that I created for the Norton Lab many years ago. It's also not really maintained.
  3. I created https://gitlab.com/wwnorton/ops/package-registry to give Pub Tools a new place to publish their packages but it's not currently being used by anyone.

The common theme here is that we're not maintaining any of our self-hosted proxy registries (the Verdaccio-based ones), and I'm skeptical that we ever will. Maintaining a package registry is just not a valuable enough activity for any of us, we don't currently have any teams assigned to the task, and there are fully-managed options available to us now that weren't when our self-hosted solutions were created.

For that reason, I'm proposing we abandon self-hosting and move to GitLab as our canonical package registry.

Proposed solution

I propose we move all proprietary packages to GitLab's Package Registry, a fully-fledged package registry that currently supports all the package formats that we need today. This will be a single project where all official Norton packages should be published, following GitLab's Store all of your packages in one GitLab project model.

I believe moving to a GitLab-managed solution is a good choice for a few reasons:

  1. It's a managed solution. Unlike Verdaccio, Norton engineers won't need to do things like updating the underlying application, configuring networking & DNS records, setting up authentication and access control, or archiving, just to name a few examples of things that have been a pain with our Verdaccio registries.
  2. We have a contract with GitLab. This means we we're already paying for this feature, but perhaps more importantly, that we have support and a contractual recourse should something go wrong.
  3. We use GitLab for many other things, which means this reduces our tool/solution surface. I consider this a plus because of the context switching cost when using many tools. In other words, "where's the package registry?" will be one less thing to remember.
  4. GitLab's transparent vision for their Package stage gives me more confidence that they will continue supporting this solution over a reasonable time horizon. We can never be 100% certain that companies will continue supporting solutions, but we at least have some strong signals from GitLab. Another market signal is that GitHub—GitLab's primary competitor—is also investing in package solutions.

Every GitLab project has a package registry attached to it where that codebase can publish its own packages. While this is available to teams today, using a distributed system like this makes for an awful developer experience where you'd have to know the package registry address for every package that you want to install.

Detailed design

We should have one repository on GitLab where all projects can both publish packages to and consume packages from (a project registry). This repository won't have any real code other than documentation in the form of a README, which will cover usage and management. Its primary function will be its package registry (${repo_name}/-/package), which will contain all the packages that are published to it.

I've already created this repo at https://gitlab.com/wwnorton/ops/package-registry and fleshed out the README to describe usage for Node.js / NPM. If we ever need to support additional package managers, details should be added for that package management solution. For instance, if we want to use it as a Maven registry for Java packages, an entry should be added under the ### Java / Maven heading.

The only change that I would propose for the final design is to move the repository up to the https://gitlab.com/wwnorton organization level since it can and should be used by anyone at Norton. This will make it more visible to anyone who needs to use or manage it.

Drawbacks

There are two major drawbacks to this:

  1. It'll more tightly couple our development workflows to infrastructure that we don't control. In other words, if gitlab.com (or their package management service) goes down, we won't be able to run npm install locally or in CI jobs, halting many types of development tasks. This is not a minor concern, but it strikes me as a reasonable risk. Should it become a problem, I would suggest creating a self-managed clone of the registry that we can switch to during a GitLab outage, but that would be premature optimization to do now.
  2. Anyone (humans or agents) who needs to install a private Norton package will need access to the package registry on GitLab. This has some key implications that might be obvious results of this, but are worth making explicit:
    1. While Guests can install packages, on our current subscription (Premium), Guest users count towards the license seat usage (~$200 / seat / year).
    2. Third parties won't have access to install private Norton packages.
    3. GitLab pipeline agents (the CI/CD job token) will need access to this repository.

Alternatives

Our primary alternatives include:

  1. Continuing to maintain npm.wwnorton.net. This is less desirable because it's infrastructure we have to manage. And while that might work for NPM packages, we get support ofr other package types like Helm charts or PyPI for free with the GitLab solution.
  2. Using a different package management service. I'll be honest that I haven't researched these much, but even if one exists that might work, we would have to start a relationship with that service. Since we already have a contract with GitLab, this would be a purely added cost, both in management and in dollars.
    • An example of this would be creating a paid account on the public NPM registry, which would allow us to publish private packages.

Adoption strategy

Once the package registry is officially "ready" for publishing, we'll deprecate all other package registries, setting a date for when they'll be shut down. This date should be no less than three months after the deprecation to give teams time to adjust their codebases before terminating the old infrastructure. Once the deprecation date is set, I'll coordinate with all engineers to begin publishing their packages to the registry ahead of that timeline.

Package maintainers will need to consider whether they need to back-publish old versions or not. Back-publishing can be done in any order as long as the latest version is tagged once they're done back-publishing (npm dist-tag add @wwnorton/${package_name}@${version} latest).

After the cut-over, our CI/CD components should also help with the adoption by setting the default npm registry for @wwnorton-scoped Node.js packages so that developers don't have to duplicate this setup in their code. It would also be ideal to create a component to automate publishing to the new package registry for our libraries, but that's beyond the scope of this RFC.

Unresolved questions

  1. Are there any package registries I missed in ENG-68?
  2. How many packages are currently being published to other private registries? Who maintains them?