How Airbnb makes use of data-driven segmentation to know provide availability patterns.
By: Alexandre Salama, Tim Abraham
At Airbnb, our provide comes from hosts who determine to record their areas on our platform. In contrast to conventional accommodations, these areas will not be all interchangeable items in a constructing which are out there to e book year-round. Our hosts are folks, with totally different earnings targets and schedule constraints — resulting in totally different ranges of availability to host. Understanding these variations is a key enter into how we develop our merchandise, campaigns, and operations.
Through the years, we’ve created numerous methods to measure host availability, creating “options” that seize totally different features of how and when listings can be found. Nevertheless, these options present an incomplete image when considered in isolation. For instance, a ~30% availability fee might point out two very totally different situations: a bunch who solely accepts bookings on weekends, or a bunch whose itemizing is barely out there throughout a particular season, corresponding to summer time.
That is the place segmentation is available in.
By combining a number of options, segmentation permits us to create discrete classes that characterize the totally different availability patterns of hosts.
However conventional segmentation methodologies, corresponding to “RFM” (Recency, Frequency, Financial), are centered on buyer worth relatively than calendar dynamics, and are sometimes restricted to one-off analyses on small datasets. In distinction, we’d like an method that may deal with calendar information and day by day inference for hundreds of thousands of listings.
To deal with the above challenges, this weblog put up explores how Airbnb used segmentation to higher perceive host habits at scale. By enriching availability information with novel options and making use of machine studying methods, we developed a sensible and scalable method to section availability for hundreds of thousands of listings day by day.
Think about Alice and Max, two hosts with an identical 2-bedroom residences on Airbnb. Nevertheless, Alice solely lists her property in the summertime, whereas Max has it out there year-round — reflecting two distinct internet hosting types.
Alice’s seasonal availability means that she may dwell within the property more often than not, solely renting it out in the course of the summer time months. Airbnb can help her with seasonal pricing ideas, onboarding guides for infrequent hosts, and settings strategies.
Conversely, Max’s full-time availability signifies a extra skilled internet hosting type, probably his main revenue supply. Airbnb can present him with superior reserving analytics, instruments for managing a number of reservations, and steering on earnings and tax implications.
How can we create a dataset that captures these essential variations in internet hosting habits?
Availability Fee
A primary step is to seize the host’s “intention to be out there” on a particular night time. Availability might be each analyzed from a backward-looking (prior to now) or forward-looking (sooner or later) perspective. For simplicity, this put up focuses on backward-looking availability, because it displays the ultimate state of a calendar in any case adjustments in stock, bookings and cancellations have occurred. Ahead-looking availability will not be as simple as a result of adjustments can nonetheless occur between the evaluation date and the long run dates being analyzed.
We contemplate each:
- Nights Vacant: nights when the itemizing was listed as out there for reserving on Airbnb, and remained vacant.
- Nights Booked: nights when the itemizing was listed as out there for reserving on Airbnb, and was later booked on Airbnb.
Consequently, we will calculate the corresponding Nights Meant to be Accessible, or Nights Accessible, for the 365-day look-back interval because the sum of Nights Vacant and Nights Booked. We then divide it by 365, to acquire the corresponding Availability Fee.
From this distribution we observe:
- A substantial proportion of listings has little-to-no availability (~0% availability fee).
- Conversely, a major proportion of listings has close to full availability (~100% availability fee).
- Between these extremes, a major set of listings emerges with out sturdy breakpoints.
How can we additional differentiate these listings that fall within the center vary?
Streakiness
For listings that aren’t at both finish of the spectrum, availability fee by itself is inadequate for capturing the nuances of how an inventory is made out there all through the month. Think about listings A and B, which each have a 50% availability fee in a given month.
Though these listings have distinct availability patterns, they each have the identical availability fee (50%)!
Itemizing A’s concentrated, block-like availability might lend itself to suggestions for weekly keep reductions, or recommendation for hosts who’re away for an extended stretch — steering which is probably not appropriate for Itemizing B.
To seize this distinction, we introduce “Streakiness”. Within the instance above, Itemizing A had 1 lengthy streak of availability which was interrupted on night time 16, whereas Itemizing B had 8 brief streaks of availability, every lasting 2 nights earlier than a 2-night break.
We outline a streak as a consecutive sequence of availability with a minimal of two consecutive nights, adopted by a subsequent interval of at the least 2 consecutive nights of unavailability, as described within the diagram under. Observe that we initially thought-about utilizing a single night time of availability/unavailability as a threshold however discovered it to be a much less dependable sign of the consistency that streakiness goals to measure.
This leads us to the corresponding Streakiness function, computed because the ratio of Streaks divided by the variety of Nights Accessible (computed within the earlier part). At this level, we now have two comparatively orthogonal options for our evaluation: availability fee and streakiness.
Seasonality
We discovered that whereas availability and streakiness present a stable foundation for measuring quantity and consistency, they don’t seize a calendar’s “compactness” — in different phrases, its seasonality. For instance, contemplate Listings C and D, which each have round 15% availability and 14 streaks:
- Itemizing C concentrates its availability inside a narrower block of time (summer time season) — see first calendar under.
- Itemizing D distributes its availability extra evenly throughout a number of quarters — see second calendar under.
Seasonality performs an important function in Airbnb’s enterprise, as visitor demand and host availability fluctuate with adjustments in seasonal enchantment, holidays, and native occasions. Given this, we suggest to create a Quarters with at Least One Night time of Availability function.
Moreover, we create a Most Consecutive Months function which captures streakiness at a yearly scale, highlighting the longest steady interval an inventory is accessible. Collectively, these options give clearer perception into seasonal patterns.
Closing dataset
The ultimate function set consists of all listings that have been listed on the platform as of a broad set of dates. For every itemizing, we calculate the options we’ve designed within the earlier sections. Then, we take a big, random pattern throughout these dates. Lastly, we scale the numerical options to make sure they’re on a comparable scale.
We are able to now apply a K-means clustering algorithm to determine segments, testing fashions with Ok values from 2 to 10. Utilizing the elbow plot to seek out the optimum variety of clusters, we choose 8 clusters as the very best illustration of our information.
We now have our clusters, however they don’t have names but. Our cluster naming course of includes a number of steps:
- Checking the distribution of every function by cluster to determine sturdy variations (e.g., “cluster 1 has the best availability fee”)
- Randomly sampling listings from every cluster and visualizing their calendars
- Iterating on naming with a cross-functional inner working group
The output of this course of is summarized within the desk under, whereas the next diagram shows a “typical” calendar for every cluster.
Since we’re measuring a latent attribute — underlying host habits patterns that don’t have “floor reality” labels — there isn’t a completely correct method to validate our segmentation. Nevertheless, we will use numerous methodologies to make sure that it “is sensible” from a enterprise perspective, and reliably displays real-life host behaviors.
We achieve this in three steps:
- A/B Testing
- Correlates of Availability Segments
- Consumer Expertise (UX) Analysis
A/B Testing
In an A/B check, we assessed how the totally different segments beforehand used a function that inspired hosts to finish “really helpful actions” (e.g., letting company e book their residence last-minute) so they could earn a financial incentive.
We present the usage of the function by every section under. These outcomes align with our instinct: hosts who use Airbnb for particular events or not often is probably not interested by following suggestions, even when incentivized. Equally, “At all times On” hosts, who’re already extremely engaged and proactive in managing their listings, may favor to depend on their very own methods relatively than comply with Airbnb’s strategies. Hosts who fall someplace in between, with reasonable ranges of engagement, would be the preferrred goal for incentives, as they’re seemingly open to changes that might increase their efficiency.
Correlates of Availability Segments
We additionally validate our clusters by checking correlations with recognized attributes. For example, we affirm that “At all times On” listings are seemingly extra managed by professionals, or that “Quick Seasonal” listings are seemingly extra widespread in ski or seaside locations.
Moreover, we all know it’s common to look at a rise within the variety of listings round huge occasions. As anticipated, we observe an increase in “Occasion Motivated” listings main as much as and through main occasions durations, reflecting hosts’ responsiveness to elevated demand.
UX Analysis
Lastly, we all know the UX Analysis workforce conducts host surveys to create qualitative personas, which we evaluate in opposition to our clusters to make sure they align with real-world habits. For example, we confirm if segments with excessive weekend availability match hosts who self-report preferring weekend leases.
Now, we have to scale this segmentation to all our listings.
To attain this, we use a decision tree algorithm. We practice a mannequin utilizing our 4 options, with cluster labels from our Ok-means mannequin as outputs. We additionally carry out a train-test cut up to ensure the mannequin precisely predicts every cluster.
This new mannequin supplies a easy, interpretable set of if-else guidelines to categorise listings into clusters. Utilizing the choice tree construction, we translate the mannequin’s logic right into a SQL question by changing the choice tree’s “IF” circumstances into “CASE WHEN” statements. This integration permits the mannequin to be propagated in our information warehouse.
At Airbnb, numerous groups leverage these segments: product groups to tell technique and analyze heterogeneous remedy results in A/B assessments, advertising groups for focused messaging, and UX analysis groups for insights into hosts’ motivations.
For example, we revealed a possibility to spice up Instant Book adoption amongst “Occasion Motivated” hosts, who could often record their main residence and like guide visitor screening. Including an choice for hosts to solely settle for company with a sure ranking could make Prompt Guide extra interesting to them, providing a stability between host management and reserving effectivity.
Initially designed for itemizing availability information, this segmentation methodology has additionally been tailored to host exercise information. We developed a second segmentation centered on days with “host engagement” (e.g., adjusting costs, updating insurance policies, revising itemizing descriptions) to distinguish occasional “Settings Tinkerers” from frequent “Settings Optimizers.”
This method can be tailored to different industries the place understanding temporal engagement is crucial, as an example, to differentiate:
- Social Media: informal lurkers vs. energetic content material creators
- Ridesharing: occasional drivers throughout peak demand vs. full-time drivers
- Streaming Providers: nighttime streamers vs. steady streamers
- E-commerce: gross sales/holidays fanatics vs. year-round customers
This weblog put up was a collaborative effort, with important contributions from Tim Abraham, the principle co-author. We’d additionally prefer to acknowledge the invaluable help of workforce members from a number of organizations, together with (however not restricted to) Regina Wu, Maggie Jarley, and Peter Coles.