LinkedIn: A High Performance, Advanced Analytics Pioneer

Add bookmark
Elizabeth Mixson
Elizabeth Mixson
06/08/2021

Sponsored Content


When was the last time you embarked on a job search without turning towards LinkedIn as a key resource? Over the past 18 years, LinkedIn has evolved from a professional networking site to an $8.5 billion social media powerhouse. 

With over 740 million members and 55 million registered companies, LinkedIn collects vast amounts of data pertaining to:

  • job search and hiring
  • sales & marketing 
  • trending current events and articles
  • professional training and upskilling

And this is only the tip of the iceberg. In fact all the way back in 2013, LinkedIn’s CEO at the time, Jeff Weiner, explained, “Our ultimate dream is to develop the world’s first economic graph, a sort of digital map of skills, workers and jobs across the global economy. Ambitions, in other words, that are a far cry from the industry’s early stabs at modernising the old-fashioned jobs board.”

Though entire textbooks could be written on LinkedIn’s sophisticated approach to big data analytics, today we’re going to look at some of its more recent enterprise data achievements. 

 

Using Heterogeneous Recommendation System to Expand Human Networks and Minds

Any basic LinkedIn user is familiar with its People You May Know (PYMK) feature - a recommendation engine that suggests who you should connect with next based on shared connections and work history.

However, as LinkedIn’s platform has evolved to include so much more than just resumes and job listings, so must its recommendation engine. Now, in addition to listing potential contacts, the PYMK also showcases relevant hashtags, companies, groups, newsletters, and event recommendations.

With this in mind, LinkedIn’s data scientists and engineers set out to build a heterogeneous recommendation system. In other words, a recommendation engine capable of analyzing multiple entities in multiple way. 

As explained on LinkedIn’s tech blog, “We develop an XGBoost model that predicts the probability of downstream-interactions of a member with top-k entities (entities occupying first k positions) within a cohort and ranks the cohorts using this probability score. A like, comment, or a re-share on content produced by the entity is counted as a downstream interaction; so for a connection-edge, this would mean number of likes, comments, or re-shares on the content posted by the connection. This model trains against a logistic loss with binary labels (corresponding to if there was a downstream-interaction or not) and uses calibrated scores from Edge-FPRs as features in addition to other member-level features. Further, we also design counterfactual experiments to estimate the relative importance of each edge type in the form of importance factors that are multiplied to the scores of the corresponding cohorts.”

 

Greykite: LinkedIn’s Open Source Predictive Analytics Library

LinkedIn recently launched it’s Greykit open source forecasting library to promote community innovation sharing its “fast, accurate, and highly customizable algorithms for forecasting.”  The star of the show, LinkedIn’s flagship algorithm, Silverkite, provides time series forecasts that can be used for resource planning, performance management, optimization, and ecosystem insight generation. 

Currently, at LinkedIn, SilverKite is being used to: 

  • Predict traffic patterns based on seasonality, events/holidays, and/or short-range effects
  • Establish and track operational KPIs 
  • Forecasting market growth and budgetary needs
  • To understand which countries are recovering faster or slower after a shock like the COVID-19 pandemic

*Image sourced from https://engineering.linkedin.com/blog/2021/greykite--a-flexible--intuitive--and-fast-forecasting-library 

 

Responsible AI

LinkedIn’s AI strategy prioritizes fair, anti-bias AI and data privacy. The goal is to ensure that  two members of equal talent have equal access to opportunities without compromising member privacy.

In order to support these objectives, LinkedIn has developed a number of new tools and solutions. For example, LinkedIn’s Fairness Toolkit (LiFT) is a Scala/Spark library that enables the measurement of fairness and the mitigation of bias in large-scale machine learning workflows. In other words, it can be used to measure biases in training data, evaluate different fairness notions for ML models, and detect statistically significant differences in model performance across different subgroups.

When it comes to data privacy, amongst other things, they use differential privacy techniques to aggregate data insights and share them with the world without compromising member privacy in the dataset. Using this approach, LinkedIn was able to build its job trend visualizations and other data wrapping products that help its users make smarter hiring and job search decisions.

 


RECOMMENDED