May 18, 2024
Pinterest Engineering
Pinterest Engineering Blog

11 min learn

Mar 12, 2024

Monil Mukesh Sanghavi | Software program Engineer, Actual Time Analytics Staff; Xiao Li | Software program Engineer, Actual Time Analytics Staff; Ming-Could Hu | Software program Engineer, Actual Time Analytics Staff; Zhenxiao Luo | Software program Engineer, Actual Time Analytics Staff; Kapil Bajaj | Supervisor, Actual Time Analytics Staff

man’s hands holding a stopwatch

At Pinterest, one of many pillars of the observability stack supplies inner engineering groups (our customers) the chance to watch their companies utilizing metrics information and arrange alerting on it. Goku is our in-house time collection database offering price environment friendly and low latency storage for metrics information. Beneath, Goku isn’t a single cluster however a group of sub-service elements together with:

  • Goku Brief Time period (in-memory storage for the final 24 hours of information, known as GokuS)
  • Goku Lengthy Time period (ssd and hdd based mostly storage for older information, known as GokuL)
  • Goku Compactor (time collection information aggregation and conversion engine)
  • Goku Root (good question routing)

You may learn extra about these elements within the weblog posts on GokuS Storage, GokuL (long run) storage, and Price Financial savings on Goku, however quite a bit has modified in Goku since these have been written. We’ve got carried out a number of options that elevated the effectivity of Goku and improved the person expertise. On this 3 half weblog publish collection, we’ll cowl the effectivity enhancements in 3 main points:

  1. Enhancing restoration time of each GokuS and GokuL (that is the whole time a single host or cluster in Goku takes to return up and begin serving time collection queries)
  2. Enhancing question expertise in Goku by reducing latencies of pricy and excessive cardinality queries
  3. Lowering the general price of Goku at Pinterest

We’ll additionally share some learnings and takeaways from utilizing Goku for storing metrics at Pinterest.

This 2nd weblog publish focuses on how Goku time collection queries have been improved. We’ll present a quick overview of Goku’s time collection information mannequin, question mannequin, and structure. We’ll comply with up with the advance options we added together with rollup, pre-aggregation, and pagination.

The info mannequin of a time collection in Goku is similar to OpenTSDB’s (which Goku changed) information mannequin. You will discover extra particulars here. Right here’s a fast overview of the Goku TimeSeries information mannequin.

A time collection metadata or key consists of the next:

Metric Name: proc.stat.cpu; Tag Value Combination 1: host=abc; Tag Value Combination 2: cluster=goku; Tag Value Combination 3: az=us-east-1a;  Tag Value Combination n: os=ubuntu-1

The info a part of a time collection, which we consult with as time collection stream, consists of information factors which can be time worth pairs, the place time is in unix time and worth is a numerical worth.

Data point 1 — Timestamp: 16:00, Value: 3.0; Data point 2 — Timestamp: 16:01, Value: 4.2; Data point 3 — Timestamp: 16.02, Value: 5.2; Data point n — Timestamp: 16.59, Value: 4.0

A number of hosts can emit time collection for a novel metric identify. For instance: cpu,reminiscence,disk utilization or some utility metric. The host-specific info is a part of one of many tags talked about above. For instance: tag- key == host and worth == host identify.

Multicolor chart displaying TimeSeries number, Metric Name, Tag Value 1, Tag Value 2, Tag Value 3, Tag Value n

A cardinality of a metric (i.e. metric identify) is outlined as the whole variety of distinctive timeseries for that metric identify. A singular time collection has a novel mixture of tag keys and values. You may perceive extra about cardinality here.

For instance, the cardinality of the metric identify “proc.stat.cpu” within the above desk is 5, as a result of the mix of tag worth pairs together with the metric identify of every of those 5 timeseries don’t repeat. Equally, the cardinality of the metric identify “proc.stat.mem” is 3. Notice how we characterize a specific string (be it metric identify or tag worth) as a novel coloration. That is to point out {that a} sure tag worth pair might be current in a number of time collection, however the mixture of such strings is what makes a time collection distinctive.

Goku makes use of apache thrift for Question RPC. The question mannequin of Goku is similar to OpenTSDB’s question mannequin specified here. To summarize, a question to Goku Root is much like the request specified beneath:

Let’s go over the vital choices within the request construction above:

  • metricName — metric identify with out the tag combos
  • listing<Filter> — filters on tag values like sample match, wildcard, embody/ exclude tag worth (might be a number of), and so forth.
  • Aggregator — sum/ max/ min/ p99/ rely/ imply/ median and so forth. on the group of timeseries
  • Downsample — person specified granularity in time returned in outcomes
  • Rollup aggregation/ interval — downsampling at a time collection stage. This selection turns into obligatory in lengthy vary queries (you will notice the explanation beneath in Rollup).
  • startTime, endTime — vary of question

The question response appears as follows:

The monitoring and alerting framework at Pinterest (internally known as statsboard) question consumer sends QueryRequest to Goku Root, which forwards it to the leaf clusters (GokuS and/ or GokuL) based mostly on the question time vary and the shards they host. The leaf clusters do the required grouping (filtering), interpolation, aggregation, and downsampling as wanted and reply to the Goku Root with QueryResponse. The Root will once more do the aggregation if crucial and reply to the statsboard question consumer with QueryResponse.

Let’s now take a look at how we improved the question expertise.

Goku helps the bottom time granularity of 1 second within the time collection stream. Nonetheless, having such fantastic granularity can affect the question efficiency as a result of following causes:

  • An excessive amount of information (too many information factors) over the community for a non downsample uncooked question
  • Costly computation and therefore cpu price whereas aggregating due to too many information factors
  • Time consuming information fetch, particularly for GokuL (which makes use of SSD, HDD for information storage)

For outdated metric information residing in GokuL, we determined to additionally retailer rolled up information to spice up question latency. Rolling up means decreasing the granularity of the time collection information factors by storing aggregated values for the determined interval. For instance: A uncooked time collection stream

when aggregated utilizing rollup interval of 5 and rollup aggregators of sum, min, max, rely, common may have 5 shorter time collection streams as follows:

The next desk explains the tiering and rollup technique:

Rollup benefitted the GokuL service in 3 methods:

  • Decreased the storage price of ample uncooked information
  • Decreased the info fetch price from ssd, lowered the cpu aggregation price, and thus lowered the question latency
  • Some queries that might day trip from the OpenTSDB supporting HBase clusters would return profitable question outcomes from GokuL.

The rollup aggregation is finished within the Goku compactor (defined right here) earlier than it creates the sst information containing the time collection information to be saved within the rocksDB based mostly GokuL situations.

In manufacturing, we observe that p99 latency of queries utilizing rolled up information is sort of 1000x lower than queries utilizing uncooked information.

P99 latency for GokuL question utilizing uncooked information is sort of a number of seconds
GokuL question utilizing rollup information has p99 in milliseconds.

At question time, Goku responds with an exception stating “cardinality restrict exceeded” if the variety of time collection the question would choose/ learn from publish filtering exceeds the pre-configured restrict. That is to guard the Goku system assets as a consequence of noisy costly queries. We noticed queries for prime cardinality metrics hitting timeouts, chewing up the system assets, and affecting the in any other case low latency queries. Usually, after analyzing the excessive cardinality or timing out queries, we discovered that the tag(s) that contributed to the excessive cardinality of the metric weren’t even wanted by the person within the last question outcome.

The pre-aggregation function was launched with the purpose of eradicating these undesirable tags within the pre-aggregated metrics, thus, decreasing the unique cardinality, decreasing the question latency, and efficiently serving the question outcomes to the person with out timing out or consuming plenty of system assets. The function creates and shops aggregated time collection by eradicating pointless tags that the person mentions. The aggregated time collection has tags that the person has particularly requested to protect. For instance:

If the person asks to allow pre-aggregation for the metric “app.some_stat” and needs to protect solely the cluster and az info, the pre-aggregated time collection will appear like this:

Notice how the cardinality of the pre-aggregated metric is lowered from 5 to three.

The pre-aggregated metrics are new time collection created inside Goku that don’t change the unique uncooked time collection. Additionally for the sake of simplicity, we determined to not introduce these metrics again into the everyday ingestion pipeline that we emit to Kafka.

Here’s a circulate of how enabling pre-aggregation works:

  1. Customers experiencing excessive latency queries or queries hitting cardinality restrict exceeded timeout determine to allow pre-aggregation for the metric.
  2. The Goku group supplies the tag mixture distribution of the metric to the person. For instance:

3. Customers determine on the tags they wish to protect within the pre-aggregated time collection. The “to be preserved” tags are known as grouping tags. There may be additionally an non-compulsory provision offered to pick a specific tag key == tag worth mixture to be preserved and discard all different tag worth combos for that tag key. These provisions are known as conditional tags.

4. Person is notified of the lowered cardinality and pre-aggregation is enabled for the metric which the person finalizes.

Write path change:

After consuming an information level for a metric from Kafka, the Goku Brief Time period host checks if the time collection qualifies to be pre-aggregated. If the time collection qualifies, the worth of the datapoint is entered in an in reminiscence information construction, which information the sum, max, min, rely, and imply of the info seen up to now. The info construction additionally emits 5 aggregated information factors (aggregations talked about above) for the time collection with an internally modified Goku metric identify each minute.

Learn Path change:

Within the question request to Goku Root, the observability statsboard consumer sends a boolean, which determines if the pre-aggregated model of the metric must be queried. Goku Root does the corresponding metric identify change to question the precise time collection.

Success story: One manufacturing metric (within the instance offered above) saved in Goku on which alerts have been set was seeing excessive cardinality exceptions (cardinality ~32M throughout peak hours).

We reached out to the person to assist perceive the use case and instructed enabling pre-aggregation for his or her metric. As soon as we enabled pre-aggregation, the queries efficiently accomplished with latencies beneath 100ms.

We’ve got onboarded greater than 50 use circumstances for pre-aggregation.

Throughout launch to manufacturing, a question timeout function needed to be carried out in Goku Lengthy Time period to keep away from an costly question consuming the server assets for a very long time. This, nonetheless, resulted in customers of pricy queries seeing timeouts and wastage of server assets even when it was for a brief time period (i.e. configured question timeout). To confront this difficulty, the pagination function was launched, which might promise a non timed out outcome to the top person of an costly question, despite the fact that it might take longer than traditional. It might additionally break/ plan the question in such a manner that useful resource utilization on the server is managed.

The workflow of the pagination function is:

  1. Question consumer sends a PagedQueryRequest to Goku Root if the metric is within the listing of pagination supported metrics.
  2. Goku Root plans the question based mostly on time slicing.
  3. Goku Root and Question consumer have a collection of request-response exchanges with the foundation server. This supplies the question consumer with a touch of what needs to be the subsequent begin and finish time vary of the question and its personal IP deal with in order that the visitors managing envoy can route the question to the precise server.

We’ve got included ~10 use circumstances in manufacturing.

The next are concepts we’ve to additional enhance question expertise in Goku:

Tag-based aggregation in Goku

Throughout compaction, generate pre-aggregated time collection by aggregating on the excessive cardinality contributing tags like host, and so forth. Work with the consumer group to determine such tags. This may generate time collection and improve the storage price, however not by a lot. Within the queries, if the excessive cardinality tags aren’t current, the leaf server will mechanically serve utilizing the pre-aggregated time collection.

Presently, the consumer observability group already has a function in place to take away the excessive cardinality contributing host tag from a set of long run metrics. Sooner or later, this may make use of the tag-based aggregation assist in Goku, or Goku can present the tips that could the observability group based mostly on the question evaluation above to incorporate extra long run metrics of their listing.

Put up-query processing assist in Goku

Many customers of statsboard use the tscript publish question processing to additional course of their outcomes. The pushing of this processing layer into Goku can present the next advantages:

  1. Leverages additional compute assets accessible at Goku Root and Goku Leaf (GokuS and GokuL) clusters
  2. Much less information over the community resulting in doable decrease question latencies

Some examples of publish question processing assist embody discovering the highest N time collection, summing of the time collection, and so forth.

Backfilling assist in pre-aggregation

We at the moment don’t assist pre-aggregated queries for a metric for a time vary that falls earlier than the time the metric was configured for pre-aggregation. For instance: if a metric was enabled for pre-aggregation on 1st Jan 2022 00:00:00, customers received’t be capable of question pre-aggregated information for time earlier than thirty first Dec 2021 23:59:59. By supporting pre-aggregation throughout compaction, we will take away this restrict and slowly however steadily (as bigger tier buckets begin forming), customers will begin seeing pre-aggregated information for older time ranges.

SQL assist

Presently, Goku is queryable solely by utilizing a thrift interface for RPC. SQL is broadly used worldwide as a querying framework for information, and having SQL assist in Goku would considerably assist analytical use circumstances. We’re beginning to see an growing demand for this and are exploring options.

Learn from S3

A capability to retailer and browse from S3 would assist Goku prolong the ttl of uncooked information, and even prolong the ttl of queryable metrics information. This might additionally show price helpful to retailer metrics which can be occasionally used.

Particular due to Rui Zhang, Hao Jiang, and Miao Wang for his or her efforts in supporting the above options. An enormous due to the Observability group for his or her assist and assist for these options on the person going through aspect.

Within the subsequent weblog, we’ll concentrate on how we introduced down the price of the Goku service(s).

To be taught extra about engineering at Pinterest, try the remainder of our Engineering Weblog and go to our Pinterest Labs web site. To discover and apply to open roles, go to our Careers web page.