May 18, 2024
Tiantian Zhang
The Airbnb Tech Blog

How we quantify model perceptions from social media platforms via deep studying

By Tiantian Zhang, Shuai Shao (Shawn)

At Airbnb, we now have developed Brandometer, a state-of-the-art pure language understanding (NLU) approach for understanding model notion primarily based on social media knowledge.

Model notion refers back to the normal emotions and experiences of consumers with an organization. Quantitatively, measuring model notion is an especially difficult process. Historically, we depend on buyer surveys to search out out what clients take into consideration an organization. The downsides of such a qualitative research is the bias in sampling and the limitation in knowledge scale. Social media knowledge, however, is the most important shopper database the place customers share their experiences and is the perfect complementary shopper knowledge to seize model perceptions.

In comparison with conventional approaches to extract concurrency and count-based prime related subjects, Brandometer learns word embeddings and makes use of embedding distances to measure relatedness of brand name perceptions (e.g., ‘belonging’, ‘related’, ‘dependable’). Phrase embedding represents phrases within the type of real-valued vectors, and it performs nicely in reserving semantic meanings and relatedness of phrases. Phrase embeddings obtained from deep neural networks are arguably the preferred and evolutionary approaches in NLU. We explored quite a lot of phrase embedding fashions, from quintessential algorithms Word2Vec and FastText, to the most recent language mannequin DeBERTa, and in contrast them when it comes to producing dependable model notion scores.

For ideas represented as phrases, we use similarity between its embedding and that of “Airbnb” to measure how vital the idea is with respect to the Airbnb model, which is known as as Notion Rating. Model Notion is outlined as Cosine Similarity between Airbnb and the precise key phrase:

the place

Eq. 1

On this weblog publish, we’ll introduce how we course of and perceive social media knowledge, seize model perceptions through deep studying and the right way to ‘convert’ the cosine similarities to calibrated Brandometer metrics. We may even share the insights derived from Brandometer metrics.

Downside Setup and Knowledge

To be able to measure model notion on social media, we assessedall Airbnb associated mentions from 19 platforms (e.g., X — previously often called Twitter, Fb, Reddit, and so on) and generated phrase embeddings with state-of-the-art fashions.

To be able to use Social media knowledge to generate significant phrase embeddings for the aim of measuring model notion, we conquered two challenges:

  • High quality: Social media posts are principally user-generated with various content material resembling standing sharing and opinions, and may be very noisy.
  • Amount: Social media publish sparsity is one other problem. Contemplating that it usually requires a while for social media customers to generate knowledge in response to sure actions and occasions, a month-to-month rolling window maintains stability of promptness and detectability. Our month-to-month dataset is comparatively small (round 20 million phrases) as in comparison with a typical dataset used to coach good high quality phrase embeddings (e.g., about 100 billion phrases for Google Information Word2Vec mannequin). Heat-start from pre-trained fashions didn’t assist because the in-domain knowledge barely moved the discovered embeddings.

We developed a number of knowledge cleansing processes to enhance knowledge high quality. On the similar time, we innovated the modeling methods to mitigate the impression on phrase embedding high quality on account of knowledge amount and high quality.

Along with knowledge, we explored and in contrast a number of phrase embedding coaching methods with the objective to generate dependable model notion scores.


Word2Vec is by far the only and most generally used phrase embedding mannequin since 2013. We began with constructing CBOW-based Word2Vec fashions utilizing Gensim. Word2Vec produced respectable in-domain phrase embeddings, and extra importantly, the idea of analogies. In our domain-specific phrase embeddings, we’re in a position to seize analogies within the Airbnb area, resembling “host” — “present” + “visitor” ~= “want”, “metropolis” — “mall” + “nature” ~= “park”.


FastText takes into consideration the interior construction of phrases, and is extra sturdy to out-of-vocabulary phrases and smaller datasets. Furthermore, as impressed by Sense2Vec, we affiliate phrases with sentiments (i.e., POSITIVE, NEGATIVE, NEUTRAL), which varieties model notion ideas on the sentiment ranges.


Current progress in transformer-based language fashions (e.g., BERT) has considerably improved the efficiency of NLU duties with the benefit of producing contextualized phrase embeddings. We developed DeBERTa primarily based phrase embeddings, which works higher with smaller dataset and pays extra consideration to surrounding context through disentangled consideration mechanisms. We educated all the things from scratch (together with tokenizer) utilizing Transformers, and the concatenated final consideration layer embeddings resulted in the very best phrase embeddings for our case.

Model Notion Rating Stabilization and Calibration

The variability of phrase embeddings has been extensively studied (Borah, 2021). The causes vary from the underlying stochastic nature of deep studying fashions (e.g., random initialization of phrase embeddings, embedding coaching which results in native optimum for international optimization standards) to the amount and high quality adjustments of knowledge corpus throughout time.

With Brandometer, we have to cut back the variability in embedding distances to generate steady time sequence monitoring. Steady embedding distances helped protect the inherent patterns and constructions current within the time sequence knowledge, and therefore it contributes to higher predictability of the monitoring course of. Moreover, it made the monitoring course of extra sturdy to noisy fluctuations. We studied the influential elements and took the next steps to scale back:

  1. Rating averaging over repetitive coaching with bootstrap sampling
  2. Rank-based notion rating

Score averaging over repetitive coaching with upsampling

For every month’s knowledge, we educated N fashions with the identical hyper-parameters, and took the common of N notion scores as the ultimate rating for every idea. In the meantime, we did upsampling to make it possible for every mannequin iterated on an equal variety of knowledge factors throughout months.

We outlined variability as:


the place

CosSim(w) refers back to the cosine similarity primarily based notion rating outlined in Eq. 1, A refers back to the algorithm, M refers back to the time window (i.e. month), V refers back to the vocabulary and |V| is the vocabulary dimension, and n refers back to the variety of repetitively educated fashions.

As N approaches 30, the rating variability values converge and settle inside a slender interval. Therefore, we picked N = 30 for all.