February 8, 2025
Alex Deng
The Airbnb Tech Blog

10 min learn

Dec 22, 2023

KDD (Data and Knowledge Mining) is a flagship convention in information science analysis. Hosted yearly by a particular curiosity group of the Affiliation for Computing Equipment (ACM), it’s the place you’ll find out about a number of the most ground-breaking developments in information mining, data discovery, and large-scale information analytics.

Airbnb had a major presence at KDD 2023 with two papers accepted into the principle convention proceedings and 11 talks and shows. On this weblog put up, we’ll summarize our workforce’s contributions and share highlights from an thrilling week of analysis talks, workshops, panel discussions, and extra.

Though search rating is an issue that researchers have been engaged on for many years, there are nonetheless many nuances to discover. For instance, at Airbnb, company are usually looking out over a interval of days or perhaps weeks, not minutes. And being a two-way market, there are components just like the potential for hosts to cancel the reserving that we’d prefer to account for in rating.

Optimizing Airbnb Search Journey with Multi-task Learning, our paper accepted at KDD 2023, presents Journey Ranker, a brand new multi-task deep studying mannequin. The core perception right here is that for this sort of long-term search activity, we need to optimize for intermediate steps within the consumer journey.

The Journey Ranker base module assists company in reaching optimistic milestones. There’s additionally a Twiddler module that assists company in avoiding unfavorable milestones. The modules work off a shared function illustration of itemizing and visitor context, and their output scores are mixed.

Due to its modular design, Journey Ranker can be utilized every time there are optimistic or unfavorable milestones to contemplate. We’ve applied it in numerous Airbnb search and different merchandise to drive enhancements in enterprise metrics.

We additionally co-presented a tutorial on Data-Centric AI (DCAI). DCAI is a fast-growing area in deep studying, as a result of as mannequin design matures, innovation is being pushed by information. We shared DCAI greatest practices and traits for growing coaching information, growing inference information, sustaining information, and creating benchmarks, with many examples from working with LLMs.

On-line experimentation (e.g., A/B testing) is a typical method for organizations like Airbnb to make data-driven selections. However excessive variance is steadily a problem. For instance, it’s onerous to show {that a} change in our search UX will drive worth when bookings are rare and rely on numerous interactions over a protracted time frame.

Our paper Variance Reduction Using In-Experiment Data: Efficient and Targeted Online Measurement for Sparse and Delayed Outcomes presents two new strategies for variance discount that rely completely on in-experiment information:

  1. A framework for a model-based main indicator metric that regularly estimates progress towards a delayed binary final result.
  2. A counterfactual remedy publicity index that quantifies the quantity a consumer is impacted by the remedy.

In testing, each strategies achieved a variance discount of fifty% or extra. These strategies have drastically improved our experimentation effectivity and impression.

With greater than 50% variance discount, the brand new model-based main indicator metric (listing-view utility, on the fitting) aligns with the goal uncancelled reserving metric significantly better than different indicators akin to listing-view with dates (on the left).

One other attention-grabbing problem in on-line experimentation is avoiding interference bias, which may occur when you could have competitors between your A/B check topics. Airbnb offered a keynote discuss on this subject at KDD’s 2nd Workshop on Decision Intelligence and Analytics for Online Marketplaces. For instance, when you ran an A/B check the place group B noticed decrease reserving costs, they may “cannibalize” the bookings from group A. There are two imperfect options: clustering (isolating the choices for members) and switchbacks (grouping members by time intervals).

Additionally on the workshop, we offered the paper The Price is Right: Removing A/B Test Bias in a Marketplace of Expirable Goods. This discusses the issue of lead-day bias: the place objects like live performance tickets, air journey, and Airbnb bookings differ in value primarily based on the space from their expiration date. This will wreak havoc on A/B exams, and within the paper we current a number of mitigation strategies, akin to restricted rollout, good overlapping of experiments, and Heterogeneous Remedy Impact (HTE) remixed estimator to appropriate for bias and speed up R&D course of.

Together with restricted rollout and good overlapping of experiments, HTE-remixed estimator can present sufficiently sturdy estimation of the long-term experiment impression from the short-term outcome and considerably shorten the experiment run-time.

In advertising, the million-dollar query is how a lot must you spend per channel? This may be reframed as a causal inference downside: what number of incremental conversions does every channel drive?

After we have a look at advertising actions throughout Nielsen’s Designated Advertising and marketing Areas (DMAs) we discover average to robust correlation throughout channels. This makes it onerous to isolate the impression of 1 channel from one other. The truth is, once we embody the correlated channels in the identical regression, the coefficients flip indicators for many channels, a transparent signal of multicollinearity.

Current options to multicollinearity, akin to shrinkage estimators, principal element evaluation, and partial linear regression, are notably useful for prediction issues however work much less effectively for our use case the place we have to keep enterprise interpretability whereas isolating causality. Our strategy, described within the paper Hierarchical Clustering as a Novel Solution to Multicollinearity, is to hierarchically cluster DMAs primarily based on their similarity in advertising impressions over time. With such clustering, cross-channel correlation dropped by as much as 43% and the channel coefficients now not flip indicators.

Not solely does our technique present an intuitive and efficient resolution to multicollinearity, it additionally circumvents the necessity for advanced transformation and preserves the interpretability of the information and the outcomes all through, empowering broad purposes to causal inference issues.

We offered this paper on the new KDD workshop, Causal Inference and Machine Learning in Practice: Use cases for Product, Brand, Policy, and beyond. Airbnb’s Totte Harinen co-organized this workshop, which strongly resonated with KDD’s viewers — it had 12 papers and 4 invited talks from 37 authors in 14 establishments.

As well as, we had been invited to current two talks and one poster at KDD’s 2nd Workshop on End-End Customer Journey Optimization, and joined the workshop’s panel dialogue. One in all these talks lined CLV (buyer lifetime worth) modeling. At Airbnb, we need to develop our model and neighborhood by rising all customers. Our CLV ecosystem applies two frameworks:

  1. The worth of Airbnb prospects. We use conventional ML approaches together with analysis into extra customer-lifecycle-focused architectures (i.e. HMMs). We increase this with demand-supply incrementality modeling to correctly account for visitor and host contributions to worth.
  2. The worth development that Airbnb delivers to prospects. By accounting for long-term incremental results of reserving on Airbnb together with incremental contributions from advertising and attribution methods, we will measure incremental modifications in CLV and optimize in the direction of them.

Causal inference may also be utilized to go looking. On the CJ workshop, we offered our paper Low Inventory State: Identifying Under-Served Queries for Airbnb Search, which explored the issue of searches that return a low variety of outcomes. Whether or not or not that quantity is “too low” and can deter a visitor from reserving depends upon search parameters and intent to guide. For a given search question, we will use causal inference to find out the incremental impact of an extra outcome on the chance of reserving. Our mannequin outperforms non-causal strategies and may help with provide administration as effectively.

Lastly, our poster mentioned how we measure the results of nationwide TV promoting campaigns. We analyzed TV publicity information and demographic information with information on Airbnb onsite conduct utilizing a third-party identification graph. We had been in a position to resolve disparate datasets to a singular identifier and mannequin particular person households.

We use propensity rating matching to estimate TV results, after which scale these estimates to a nationally-representative inhabitants. We leverage this information to offer tactical insights for advertising and perceive how lengthy TV results take to decay.

The plot above (from simulated research for illustration) exhibits the outcomes of an evaluation for a TV marketing campaign from August — October. We will see that the TV marketing campaign was efficient at growing bookings for households that noticed an Airbnb TV advert and was simpler for one subgroup (pink line) than the opposite subgroup.

How will you obtain science at scale in a medium-to-large engineering group? On the KDD’s 2nd Workshop on Applied Machine Learning Management, we shared Airbnb’s resolution for information science reproducibility and reuse, Onebrain. The core of Onebrain is a coding commonplace for configuring information science tasks totally in YAML. Onebrain’s backend abstracts away CI/CD, configuration/dependency administration, and command-line parsing. Because it’s “simply code,” Onebrain tasks may be checked right into a version-controlled repo, and any repo generally is a Onebrain repo.

Consumer interplay with Onebrain occurs by means of a CLI. With a single command, anybody can use an current mission as a template for their very own work, or generate a one-click URL to spin up a server and run the mission. Utilization is rising quick with over 200 distinct tasks and over 500 customers at Airbnb inside only a yr.

Whereas most of our analysis focuses on high-order information use-cases like fashions, information seize is important because it’s the start line for any evaluation. Occasion logging libraries usually seize actions on and impressions of app parts (buttons, sections, pages). However with this degree of granularity, it may be troublesome to summary out consumer conduct, measure the full time spent on a floor, or perceive the context surrounding an motion.

On the 2nd Workshop on End-End Customer Journey Optimization, we spoke a few new kind of client-side occasion known as Classes. A part of Airbnb’s client-side logging resolution, Classes present a solution to observe consumer context and behaviors inside the Airbnb product. Not like conventional time-based periods utilized in net analytics, these Classes may be tied to varied elements of the Airbnb consumer expertise. For instance, they are often tied to particular surfaces just like the checkout web page, API calls used for observability, and even inside states of the app that summary away advanced UI parts. The pliability of Classes permits us to seize a variety of consumer interactions and higher perceive their journey all through our platform.

KDD is an incredible alternative for information scientists from around the globe, and throughout business and academia, to return collectively and alternate learnings and discoveries. We had been honored to be invited to share strategies we’ve developed by means of utilized analysis at Airbnb. The methods and insights we offered at KDD have been important to enhancing Airbnb’s platform, enterprise, and consumer expertise. We’re continually motivated by improvements taking place round us, and we’re thrilled to provide again to the neighborhood and desperate to see what varieties of latest purposes and developments might come about consequently.

On the backside of the web page, you’ll discover a full listing of the talks and papers shared on this article together with the workforce members who contributed. For those who can see your self on our workforce, we encourage you to use for an open position at present.

Optimizing Airbnb Search Journey with Multi-task Studying [link]

Authors: Chun How Tan, Austin Chan, Malay Haldar, Jie Tang, Xin Liu, Mustafa Abdool, Huiji Gao, Liwei He, Sanjeev Katariya

Variance Discount Utilizing In-Experiment Knowledge: Environment friendly and Focused On-line Measurement for Sparse and Delayed Outcomes [link]

Authors: Alex Deng, Michelle Du, Anna Matlin, Qing Zhang

Past the Easy A/B check: Mitigating Interference Bias at Airbnb

Speaker: Ruben Lobel

The Worth is Proper: Eradicating A/B Take a look at Bias in a Market of Expirable Items [link]

Creator: Thu Le, Alex Deng

Unveiling the Visitor & Host Journey: Session-Primarily based Instrumentation on Airbnb Platform

Speaker: Shant Torosean

Dedicated to Lengthy-Time period Journey: Rising Airbnb By means of Measuring Buyer Lifetime Worth

Speaker: Sean O’Donell, Jason Cai, Linsha Chen

Low Stock State: Figuring out Below-Served Queries for Airbnb Search [link]

Creator: Toma Gulea, Bradley Turnbull

Measuring TV Campaigns at Airbnb

Speaker: Adam Maidman, Sam Barrows

Tutorial: Knowledge-Centric AI [link]

Presenter: Daochen Zha, Huiji Gao

Hierarchical Clustering As a Novel Answer to the Infamous: Multicollinearity Downside in Observational Causal Inference [link]

Authors: Yufei Wu, Zhiying Gu, Alex Deng, Jacob Zhu, Linsha Chen

Onebrain — Microprojects for Data Science [link]

Authors: Daniel Miller, Alex Deng, Narek Amirbekian, Navin Sivanandam, Rodolfo Carboni