January 22, 2025
A person making pesto sauce with a mortar and pestle

By: Brie Bunge and Sharmila Jesupaul

At Airbnb, we’ve lately adopted Bazel — Google’s open supply construct device–as our common construct system throughout backend, net, and iOS platforms. This submit will cowl our expertise adopting Bazel for Airbnb’s large-scale (over 11 million strains of code) net monorepo. We’ll share how we ready the code base, the ideas that guided the migration, and the method of migrating chosen CI jobs. Our purpose is to share data that may have been useful to us once we launched into this journey and to contribute to the rising dialogue round Bazel for net growth.

Traditionally, we wrote bespoke construct scripts and caching logic for numerous steady integration (CI) jobs that proved difficult to take care of and persistently reached scaling limits because the repo grew. For instance, our linter, ESLint, and TypeScript’s kind checking didn’t help multi-threaded concurrency out-of-the-box. We prolonged our unit testing device, Jest, to be the runner for these instruments as a result of it had an API to leverage a number of employees.

It was not sustainable to repeatedly create workarounds to beat the inefficiencies of our tooling which didn’t help concurrency and we had been incurring a long-run upkeep price. To sort out these challenges and to finest help our rising codebase, we discovered that Bazel’s sophistication, parallelism, caching, and efficiency fulfilled our wants.

Moreover, Bazel is language agnostic. This facilitated consolidation onto a single, common construct system throughout Airbnb and allowed us to share frequent infrastructure and experience. Now, an engineer who works on our backend monorepo can swap to the net monorepo and know the right way to construct and take a look at issues.

Once we started the migration in 2021, there was no publicized trade precedent for integrating Bazel with net at scale exterior of Google. Open supply tooling didn’t work out-of-the-box, and leveraging remote build execution (RBE) launched further challenges. Our net codebase is giant and accommodates many unfastened information, which led to efficiency points when transmitting them to the distant setting. Moreover, we established migration ideas that included enhancing or sustaining total efficiency and decreasing the influence on builders contributing to the monorepo throughout the transition. We successfully achieved each of those targets. Learn on for extra particulars.

We did some work up entrance to make the repository Bazel-ready–specifically, cycle breaking and automatic BUILD.bazel file technology.

Cycle Breaking

Our monorepo is laid out with tasks underneath a top-level frontend/ listing. To begin, we wished so as to add BUILD.bazel information to every of the ~1000 top-level frontend directories. Nevertheless, doing so created cycles within the dependency graph. This isn’t allowed in Bazel as a result of there must be a DAG of construct targets. Breaking these typically felt like battling a hydra, as eradicating one cycle spawns extra instead. To speed up the method, we modeled the issue as discovering the minimum feedback arc set (MFAS)¹ to determine the minimal set of edges to take away leaving a DAG. This set offered the least disruption, stage of effort, and surfaced pathological edges.

Automated BUILD.bazel Era

We mechanically generate BUILD.bazel information for the next causes:

  1. Most contents are knowable from statically analyzable import / require statements.
  2. Automation allowed us to rapidly iterate on BUILD.bazel adjustments as we refined our rule definitions.
  3. It might take time for the migration to finish and we didn’t need to ask customers to maintain these information up-to-date once they weren’t but gaining worth from them.
  4. Manually preserving these information up-to-date would represent an extra Bazel tax, regressing the developer expertise.

We have now a CLI device known as sync-configs that generates dependency-based configurations within the monorepo (e.g., tsconfig.json, undertaking configuration, now BUILD.bazel). It makes use of jest-haste-map and watchman with a customized model of the dependencyExtractor to find out the file-level dependency graph and a part of Gazelle to emit BUILD.bazel information. This CLI device is just like Gazelle but in addition generates further net particular configuration information reminiscent of tsconfig.json information utilized in TypeScript compilation.

With preparation work full, we proceeded emigrate CI jobs to Bazel. This was an enormous enterprise, so we divided the work into incremental milestones. We audited our CI jobs and selected emigrate those that may profit probably the most: kind checking, linting, and unit testing². To cut back the burden on our builders, we assigned the central Net Platform workforce the duty for porting CI jobs to Bazel. We proceeded one job at a time to ship incremental worth to builders sooner, achieve confidence in our strategy, focus our efforts, and construct momentum. With every job, we ensured that the developer expertise was high-quality, that efficiency improved, CI failures had been reproducible regionally, and that the tooling Bazel changed was absolutely deprecated and eliminated.

We began with the TypeScript (TS) CI job. We first tried the open supply ts_project rule³. Nevertheless, it didn’t work effectively with RBE as a result of sheer variety of inputs, so we wrote a custom rule to cut back the quantity and dimension of the inputs.

The most important supply of inputs got here from node_modules. Previous to this, the information for every npm package deal had been being uploaded individually. Since Bazel works effectively with Java, we packaged up a full tar and a TS-specific tar (solely containing the *.ts and package deal.json) for every npm package deal alongside the strains of Java JAR information (basically zips).

One other supply of inputs got here by transitive dependencies. Transitive node_modules and d.ts information within the sandbox had been being included as a result of technically they are often wanted for subsequent undertaking compilations. For instance, suppose undertaking foo depends upon bar, and kinds from bar are uncovered in foo’s emit. Because of this, undertaking baz which depends upon foo would additionally want bar’s outputs within the sandbox. For lengthy chains of dependencies, this will bloat the inputs considerably with information that aren’t truly wanted. TypeScript has a — listFiles flag that tells us which information are a part of the compilation. We are able to package deal up this restricted set of information together with the emitted d.ts information into an output tsc.tar.gz file⁴. With this, targets want solely embody direct dependencies, fairly than all transitive dependencies⁵.

Diagram displaying how we use tars and the — listFiles flag to prune inputs/outputs of :sorts targets

This tradition rule unblocked switching to Bazel for TypeScript, because the job was now effectively underneath our CI runtime funds.

Bar chart displaying the velocity up from switching to utilizing our customized genrule

We migrated the ESLint job subsequent. Bazel works finest with actions which are impartial and have a slender set of inputs. A few of our lint guidelines (e.g., particular inside guidelines, import/export, import/extensions) inspected information exterior of the linted file. We restricted our lint guidelines to those who may function in isolation as a method of decreasing enter dimension and having solely to lint instantly affected information. This meant shifting or deleting lint guidelines (e.g., those who had been made redundant with TypeScript). Because of this, we diminished CI occasions by over 70%.

Time collection graph displaying the runtime speed-up in early Might from solely operating ESLint on instantly affected targets

Our subsequent problem was enabling Jest. This offered distinctive challenges, as we wanted to convey alongside a a lot bigger set of first and third-party dependencies, and there have been extra Bazel-specific failures to repair.

Employee and Docker Cache

We tarred up dependencies to cut back enter dimension, however extraction was nonetheless gradual. To handle this, we launched caching. One layer of cache is on the distant employee and one other is on the employee’s Docker container, baked into the picture at construct time. The Docker layer exists to keep away from shedding our cache when distant employees are auto-scaled. We run a cron job as soon as every week to replace the Docker picture with the latest set of cached dependencies, putting a stability of preserving them contemporary whereas avoiding picture thrashing. For extra particulars, try this Bazel Community Day talk.

Diagram displaying symlinked npm dependencies to a Docker cache and employee cache

This added caching supplied us with a ~25% velocity up of our Jest unit testing CI job total and diminished the time to extract our dependencies from 1–3 minutes to three–7 seconds per goal. This implementation required us to allow the NodeJS preserve-symlinks possibility and patch a few of our instruments that adopted symlinks to their actual paths. We prolonged this caching technique to our Babel transformation cache, one other supply of poor efficiency.

Implicit Dependencies

Subsequent, we wanted to repair Bazel-specific take a look at failures. Most of those had been attributable to lacking information. For any inputs not statically analyzable (e.g., referenced as a string with out an import, babel plugin string referenced in .babelrc), we added help for a Bazel hold remark (e.g., // bazelKeep: path/to/file) which acts as if the file had been imported. Some great benefits of this strategy are:

1. It’s colocated with the code that makes use of the dependency,

2. BUILD.bazel information don’t have to be manually edited so as to add/transfer # keep comments,

3. There isn’t any impact on runtime.

A small variety of exams had been unsuitable for Bazel as a result of they required a big view of the repository or a dynamic and implicit set of dependencies. We moved these exams out of our unit testing job to separate CI checks.

Stopping Backsliding

With over 20,000 take a look at information and tons of of individuals actively working in the identical repository, we wanted to pursue take a look at fixes such that they’d not be undone as product growth progressed.

Our CI has three varieties of construct queues:

1. “Required”, which blocks adjustments,

2. “Elective”, which is non-blocking,

3. “Hidden”, which is non-blocking and never proven on PRs.

As we mounted exams, we moved them from “hidden” to “required” by way of a rule attribute. To make sure a single supply of reality, exams run in “required” underneath Bazel weren’t run underneath the Jest setup being changed.

# frontend/app/script/__tests__/BUILD.bazel
jest_test(
title = "jest_test",
is_required = True, # makes this goal a required verify on pull requests
deps = [
":source_library",
],
)

Instance jest_test rule. This signifies that this goal will run on the “required” construct queue.

We wrote a script evaluating earlier than and after Bazel to find out migration-readiness, utilizing the metrics of take a look at runtime, code protection stats, and failure charge. Happily, the majority of exams could possibly be enabled with out further adjustments, so we enabled these in batches. We divided and conquered the remaining burndown checklist of failures with the central workforce, Net Platform, fixing and updating exams in Bazel to keep away from placing this burden on our builders. After a grace interval, we absolutely disabled and deleted the non-Bazel Jest infrastructure and eliminated the is_required param.

In tandem with our CI migration, we ensured that builders can run Bazel regionally to breed and iterate on CI failures. Our migration ideas included delivering solely what was on par with or superior to the present developer expertise and efficiency. JavaScript instruments have developer-friendly CLI experiences (e.g., watch mode, focusing on choose information, wealthy interactivity) and IDE integrations that we wished to retain. By default, frontend builders can proceed utilizing the instruments they know and love, and in circumstances the place it’s helpful they will choose into Bazel. Discrepancies between Bazel and non-Bazel are uncommon and once they do happen, builders have a way of resolving the problem. For instance, builders can run a single script, failed-on-pr which can re-run any targets failing CI regionally to simply reproduce points.

Annotations on a failing construct with scripts to recreate the failures, e.g. yak script jest:failed-on-pr

We additionally do some normalization of platform particular binaries in order that we will reuse the cache between Linux and MacOS builds. This accelerates native growth and CI jobs by sharing cache between an area developer’s macbook and linux machines in CI. For native npm packages (node-gyp dependencies) we exclude platform-specific information and construct the package deal on the execution machine. The execution machine would be the machine executing the take a look at or construct course of. We additionally use “common binaries” (e.g., for node and zstd), the place all platform binaries are included as inputs (in order that inputs are constant irrespective of which platform the motion is run from) and the correct binary is chosen at runtime.

Adopting Bazel for our core CI jobs yielded vital efficiency enhancements for TypeScript kind checking (34% quicker), ESLint linting (35% quicker), and Jest unit exams (42% quicker incremental runs, 29% total). Furthermore, our CI can now higher scale because the repo grows.

Subsequent, to additional enhance Bazel efficiency, we shall be specializing in persisting a heat Bazel host throughout CI runs, taming our construct graph, powering CI jobs that don’t use Bazel with the Bazel construct graph, and probably exploring SquashFS to additional compress and optimize our Bazel sandboxes.

We hope that sharing our journey has supplied insights for organizations contemplating a Bazel migration for net.

Thanks Madison Capps, Meghan Dow, Matt Insler, Janusz Kudelka, Joe Lencioni, Rae Liu, James Robinson, Joel Snyder, Elliott Sprehn, Fanying Ye, and numerous different inside and exterior companions who helped convey Bazel to Airbnb.

We’re additionally grateful to the broader Bazel neighborhood for being welcoming and sharing concepts.

[1]: This drawback is NP-complete, although approximation algorithms have been devised that also assure no cycles; we selected the implementation outlined in “Breaking Cycles in Noisy Hierarchies”.

[2]: After preliminary analysis, we thought-about migrating net asset bundling as out of scope (although we could revisit this sooner or later) attributable to excessive stage of effort, unknowns within the bundler panorama, and impartial return on funding given our current adoption of Metro, as Metro’s structure already components in scalability options (e.g. parallelism, native and distant caching, and incremental builds).

[3]: There are newer TS guidelines which will work effectively for you here.

[4]: We later switched to utilizing zstd as a substitute of gzip as a result of it produces archives which are higher compressed and extra deterministic, preserving tarballs constant throughout totally different platforms.

[5]: Whereas pointless information should be included, it’s a a lot narrower set (and could possibly be pruned as an additional optimization).

All product names, logos, and types are property of their respective house owners. All firm, product and repair names used on this web site are for identification functions solely. Use of those names, logos, and types doesn’t suggest endorsement.