May 18, 2024

Liwei Guo, Vinicius Carvalho, Anush Moorthy, Aditya Mavlankar, Lishan Zhu

That is the second publish in a multi-part collection from Netflix. See right here for Half 1 which gives an outline of our efforts in rebuilding the Netflix video processing pipeline with microservices. This weblog dives into the small print of constructing our Video Encoding Service (VES), and shares our learnings.

Cosmos is the subsequent technology media computing platform at Netflix. Combining microservice structure with asynchronous workflows and serverless features, Cosmos goals to modernize Netflix’s media processing pipelines with improved flexibility, effectivity, and developer productiveness. Previously few years, the video crew inside Encoding Applied sciences (ET) has been engaged on rebuilding the whole video pipeline on Cosmos.

This new pipeline consists of numerous microservices, every devoted to a single performance. One such microservice is Video Encoding Service (VES). Encoding is an integral part of the video pipeline. At a excessive degree, it takes an ingested mezzanine and encodes it right into a video stream that’s appropriate for Netflix streaming or serves some studio/manufacturing use case. Within the case of Netflix, there are a variety of necessities for this service:

  • Given the big selection of units from cell phones to browsers to Good TVs, a number of codec codecs, resolutions, and high quality ranges must be supported.
  • Chunked encoding is a should to fulfill the latency necessities of our enterprise wants, and use instances with totally different ranges of latency sensitivity must be accommodated.
  • The potential of steady launch is essential for enabling quick product innovation in each streaming and studio areas.
  • There’s a large quantity of encoding jobs every single day. The service must be cost-efficient and take advantage of use of obtainable assets.

On this tech weblog, we are going to stroll by means of how we constructed VES to realize the above objectives and can share numerous classes we realized from constructing microservices. Please observe that for simplicity, we’ve got chosen to omit sure Netflix-specific particulars that aren’t integral to the first message of this weblog publish.

A Cosmos microservice consists of three layers: an API layer (Optimus) that takes in requests, a workflow layer (Plato) that orchestrates the media processing flows, and a serverless computing layer (Stratum) that processes the media. These three layers talk asynchronously by means of a home-grown, priority-based messaging system known as Timestone. We selected Protobuf because the payload format for its excessive effectivity and mature cross-platform help.

To assist service builders get a head begin, the Cosmos platform gives a robust service generator. This generator options an intuitive UI. With just a few clicks, it creates a fundamental but full Cosmos service: code repositories for all 3 layers are created; all platform capabilities, together with discovery, logging, tracing, and so forth., are enabled; launch pipelines are arrange and dashboards are readily accessible. We are able to instantly begin including video encoding logic and deploy the service to the cloud for experimentation.


Because the API layer, Optimus serves because the gateway into VES, which means service customers can solely work together with VES by means of Optimus. The outlined API interface is a powerful contract between VES and the exterior world. So long as the API is secure, customers are shielded from inside adjustments in VES. This decoupling is instrumental in enabling quicker iterations of VES internals.

As a single-purpose service, the API of VES is kind of clear. We outlined an endpoint encodeVideo that takes an EncodeRequest and returns an EncodeResponse (in an async approach by means of Timestone messages). The EncodeRequest object incorporates details about the supply video in addition to the encoding recipe. All the necessities of the encoded video (codec, decision, and so forth.) in addition to the controls for latency (chunking directives) are uncovered by means of the info mannequin of the encoding recipe.

//protobuf definition 

message EncodeRequest
VideoSource video_source = 1;//supply to be encoded
Recipe recipe = 2; //together with encoding format, decision, and so forth.

message EncodeResponse
OutputVideo output_video = 1; //encoded video
Error error = 2; //error message (non-compulsory)

message Recipe
Codec codec = 1; //together with codec format, profile, degree, and so forth.
Decision decision = 2;
ChunkingDirectives chunking_directives = 3;

Like some other Cosmos service, the platform robotically generates an RPC shopper based mostly on the VES API information mannequin, which customers can use to construct the request and invoke VES. As soon as an incoming request is obtained, Optimus performs validations, and (when relevant) converts the incoming information into an inside information mannequin earlier than passing it to the subsequent layer, Plato.

Like some other Cosmos service, the platform robotically generates an RPC shopper based mostly on the VES API information mannequin, which customers can use to construct the request and invoke VES. As soon as an incoming request is obtained, Optimus performs validations, and (when relevant) converts the incoming information into an inside information mannequin earlier than passing it to the subsequent layer, Plato.

The workflow layer, Plato, governs the media processing steps. The Cosmos platform helps two programming paradigms for Plato: ahead chaining rule engine and Directed Acyclic Graph (DAG). VES has a linear workflow, so we selected DAG for its simplicity.

In a DAG, the workflow is represented by nodes and edges. Nodes characterize levels within the workflow, whereas edges signify dependencies — a stage is barely able to execute when all its dependencies have been accomplished. VES requires parallel encoding of video chunks to fulfill its latency and resilience objectives. This workflow-level parallelism is facilitated by the DAG by means of a MapReduce mode. Nodes could be annotated to point this relationship, and a Scale back node will solely be triggered when all its related Map nodes are prepared.

For the VES workflow, we outlined 5 Nodes and their related edges, that are visualized within the following graph:

  • Splitter Node: This node divides the video into chunks based mostly on the chunking directives within the recipe.
  • Encoder Node: This node encodes a video chunk. It’s a Map node.
  • Assembler Node: This node stitches the encoded chunks collectively. It’s a Scale back node.
  • Validator Node: This node performs the validation of the encoded video.
  • Notifier Node: This node notifies the API layer as soon as the whole workflow is accomplished.

On this workflow, nodes such because the Notifier carry out very light-weight operations and could be straight executed within the Plato runtime. Nevertheless, resource-intensive operations must be delegated to the computing layer (Stratum), or one other service. Plato invokes Stratum features for duties reminiscent of encoding and assembling, the place the nodes (Encoder and Assembler) publish messages to the corresponding message queues. The Validator node calls one other Cosmos service, the Video Validation Service, to validate the assembled encoded video.


The computing layer, Stratum, is the place media samples could be accessed. Builders of Cosmos providers create Stratum Capabilities to course of the media. They will carry their very own media processing instruments, that are packaged into Docker photos of the Capabilities. These Docker photos are then revealed to our inside Docker registry, a part of Titus. In manufacturing, Titus robotically scales cases based mostly on the depths of job queues.

VES must help encoding supply movies into a wide range of codec codecs, together with AVC, AV1, and VP9, to call just a few. We use totally different encoder binaries (referred to easily as “encoders”) for various codec codecs. For AVC, a format that’s now 20 years previous, the encoder is kind of secure. Then again, the most recent addition to Netflix streaming, AV1, is repeatedly going by means of lively enhancements and experimentations, necessitating extra frequent encoder upgrades. ​​To successfully handle this variability, we determined to create a number of Stratum Capabilities, every devoted to a particular codec format and could be launched independently. This method ensures that upgrading one encoder won’t influence the VES service for different codec codecs, sustaining stability and efficiency throughout the board.

Throughout the Stratum Operate, the Cosmos platform gives abstractions for frequent media entry patterns. No matter file codecs, sources are uniformly introduced as domestically mounted frames. Equally, for output that must be endured within the cloud, the platform presents the method as writing to an area file. All particulars, reminiscent of streaming of bytes and retrying on errors, are abstracted away. With the platform taking good care of the complexity of the infrastructure, the important code for video encoding within the Stratum Operate could possibly be so simple as follows.

ffmpeg -i enter/supplypercent08d.j2k -vf ... -c:v libx264 ... output/encoding.264

Encoding is a resource-intensive course of, and the assets required are carefully associated to the codec format and the encoding recipe. We performed benchmarking to grasp the useful resource utilization sample, notably CPU and RAM, for various encoding recipes. Primarily based on the outcomes, we leveraged the “container shaping” function from the Cosmos platform.

We outlined numerous totally different “container shapes”, specifying the allocations of assets like CPU and RAM.

# an instance definition of container form
group: containerShapeExample1
numCpus: 2
memoryInMB: 4000
networkInMbp: 750
diskSizeInMB: 12000

Routing guidelines are created to assign encoding jobs to totally different shapes based mostly on the mixture of codec format and encoding decision. This helps the platform carry out “bin packing”, thereby maximizing useful resource utilization.

An instance of “bin-packing”. The circles characterize CPU cores and the realm represents the RAM. This 16-core EC2 occasion is full of 5 encoding containers (rectangles) of three totally different shapes (indicated by totally different colours).

After we accomplished the event and testing of all three layers, VES was launched in manufacturing. Nevertheless, this didn’t mark the tip of our work. Fairly the opposite, we believed and nonetheless do {that a} vital a part of a service’s worth is realized by means of iterations: supporting new enterprise wants, enhancing efficiency, and enhancing resilience. An vital piece of our imaginative and prescient was for Cosmos providers to have the power to repeatedly launch code adjustments to manufacturing in a protected method.

Specializing in a single performance, code adjustments pertaining to a single function addition in VES are typically small and cohesive, making them simple to overview. Since callers can solely work together with VES by means of its API, inside code is actually “implementation particulars” which are protected to vary. The specific API contract limits the take a look at floor of VES. Moreover, the Cosmos platform gives a pyramid-based testing framework to information builders in creating checks at totally different ranges.

After testing and code overview, adjustments are merged and are prepared for launch. The discharge pipeline is absolutely automated: after the merge, the pipeline checks out code, compiles, builds, runs unit/integration/end-to-end checks as prescribed, and proceeds to full deployment if no points are encountered. Sometimes, it takes round half-hour from code merge to function touchdown (a course of that took 2–4 weeks in our earlier technology platform!). The quick launch cycle gives quicker suggestions to builders and helps them make obligatory updates whereas the context remains to be recent.

Screenshot of a launch pipeline run in our manufacturing atmosphere

When working in manufacturing, the service continuously emits metrics and logs. They’re collected by the platform to visualise dashboards and to drive monitoring/alerting techniques. Metrics deviating an excessive amount of from the baseline will set off alerts and might result in automated service rollback (when the “canary” function is enabled).

VES was the very first microservice that our crew constructed. We began with fundamental data of microservices and realized a mess of classes alongside the way in which. These learnings deepened our understanding of microservices and have helped us enhance our design selections and choices.

Outline a Correct Service Scope

A precept of microservice structure is {that a} service needs to be constructed for a single performance. This sounds simple, however what precisely qualifies a “single performance”? “Encoding video” sounds good however wouldn’t “encode video into the AVC format” be an much more particular single-functionality?

Once we began constructing the VES, we took the method of making a separate encoding service for every codec format. Whereas this has benefits reminiscent of decoupled workflows, shortly we have been overwhelmed by the event overhead. Think about {that a} consumer requested us so as to add the watermarking functionality to the encoding. We wanted to make adjustments to a number of microservices. What’s worse, adjustments in all these providers are very related and basically we’re including the identical code (and checks) time and again. Such form of repetitive work can simply put on out builders.

The service introduced on this weblog is our second iteration of VES (sure, we already went by means of one iteration). On this model, we consolidated encodings for various codec codecs right into a single service. They share the identical API and workflow, whereas every codec format has its personal Stratum Capabilities. To this point this appears to strike an excellent steadiness: the frequent API and workflow reduces code repetition, whereas separate Stratum Capabilities assure unbiased evolution of every codec format.

The adjustments we made aren’t irreversible. If sometime sooner or later, the encoding of 1 specific codec format evolves into a completely totally different workflow, we’ve got the choice to spin it off into its personal microservice.

Be Pragmatic about Knowledge Modeling

At first, we have been very strict about information mannequin separation — we had a powerful perception that sharing equates to coupling, and coupling may result in potential disasters sooner or later. To keep away from this, for every service in addition to the three layers inside a service, we outlined its personal information mannequin and constructed converters to translate between totally different information fashions.

We ended up creating a number of information fashions for points reminiscent of bit-depth and backbone throughout our system. To be truthful, this does have some deserves. For instance, our encoding pipeline helps totally different bit-depths for AVC encoding (8-bit) and AV1 encoding (10-bit). By defining each AVC.BitDepth and AV1.BitDepth, constraints on the bit-depth could be constructed into the info fashions. Nevertheless, it’s debatable whether or not the advantages of this differentiation energy outweigh the downsides, specifically a number of information mannequin translations.

Ultimately, we created a library to host information fashions for frequent ideas within the video area. Examples of such ideas embody body charge, scan kind, coloration house, and so forth. As you may see, they’re extraordinarily frequent and secure. This “frequent” information mannequin library is shared throughout all providers owned by the video crew, avoiding pointless duplications and information conversions. Inside every service, further information fashions are outlined for service-specific objects.

Embrace Service API Adjustments

This may occasionally sound contradictory. We now have been saying that an API is a powerful contract between the service and its customers, and preserving an API secure shields customers from inside adjustments. That is completely true. Nevertheless, none of us had a crystal ball after we have been designing the very first model of the service API. It’s inevitable that at a sure level, this API turns into insufficient. If we maintain the idea that “the API can not change” too dearly, builders can be compelled to seek out workarounds, that are nearly definitely sub-optimal.

There are various nice tech articles about gracefully evolving API. We consider we even have a singular benefit: VES is a service inside to Netflix Encoding Applied sciences (ET). Our two customers, the Streaming Workflow Orchestrator and the Studio Workflow Orchestrator, are owned by the workflow crew inside ET. Our groups share the identical contexts and work in the direction of frequent objectives. If we consider updating API is in the very best curiosity of Netflix, we meet with them to hunt alignment. As soon as a consensus to replace the API is reached, groups collaborate to make sure a easy transition.

That is the second a part of our tech weblog collection Rebuilding Netflix Video Pipeline with Microservices. On this publish, we described the constructing technique of the Video Encoding Service (VES) intimately in addition to our learnings. Our pipeline features a few different providers that we plan to share about as effectively. Keep tuned for our future blogs on this matter of microservices!