Confronting ‘Trickster’ Figures In Market Metrics


As published by Hospitality Upgrade Magazine in its Summer 2019 issue

Numbers are tricky business… Data have arguably become the most valuable currency in business today. It is the food that nourishes companies and gives them the sustenance to act and react. It underpins the sky-high valuations of firms like Facebook; without it, AI would be just two letters of the alphabet. Data have become the accepted crutch that supports nearly all decision-making. “What do the data say?” For business leaders, when it comes to issues like profitability, market share or brand equity, the more facts, figures and information they have, the better they feel.

Huge faith is placed in data, often blindly. People want to believe the black and white figures in front of them; it feels safe and reassuring. In the hospitality sector alone, data have spawned a whole industry of firms that specialize in collecting, housing, distributing, compiling and helping users to understand data – data related to bookings, food cost, productivity, guest reviews, water usage, market demand, average spend, CRM, etc.
Are data always our friends, however? Consumer group Which?, for example, recently investigated customer ratings on Amazon and found that the site was inundated with fake five-star reviews. Can data always be trusted? Lying in wait to confuse or sabotage the conclusions that business leaders draw from data is the ‘Trickster’ ― one of several universal forces or patterns within the human mind that psychiatrist Carl Jung termed archetypes, with this one specifically representing the irrational, chaotic and unpredictable side of thought and behaviour.

These manifestations of chaos exist on balance sheets. Marketers are “betwixt and between” reality and fantasy in their interpretation and actioning of market metrics, and are quite likely completely unaware. Best practice market intelligence in sales and marketing often derives from consumer reviews, like formal satisfaction surveys, TripAdvisor ratings, Michelin star series or the ‘Net Promoter Scores’ (NPS), which is often held up as a best practice metric. This latter index gauges the loyalty of a company’s customer relationships, and some research indicates that it is correlated to revenue growth. NPS scores range from −100 (every respondent is a “detractor”) to +100 (every respondent is a “promoter”). NPS scores vary by industry, but a positive NPS (i.e., higher than zero) is generally deemed good, an NPS of +50 is generally deemed excellent, and scores over +70 are exceptional.

Such scores look straightforward, but leading-edge research in psychometrics, tests and measurements increasingly suggests that market-driven metrics like these are often and quite literally Trickster ‘figures.’

What You See is Rarely What You Get

Lurking under NPS scores can be critical consumer-based nuances or market-based idiosyncrasies that can distort responses, skew trends and promote misleading conclusions. Specifically, traditional consumer ratings, like NPS, have at least four inherent limitations:

  • The usual approach of summing scores of a test or rating does not provide linear (i.e., interval-level) measures of the underlying variable, and group differences and treatment effects are distorted.
  • Traditional scaling approaches treat all questionnaire items as equivalent, thereby making it difficult to select those questions that are most appropriate for a specific population of respondents.
  • The standard ‘raw score’ approach does not recognize that some items may be biased such that respondents with identical trait levels receive systematically different scores, e.g., younger respondents) endorse some questions more (less) often then older respondents with equal trait levels.
  • Traditional scaling approaches offer no indicators of the internal validity of respondents’ scores. Aberrant response records cannot be identified; it cannot be determined whether low scores are due to low trait levels or to respondents’ misunderstanding or incomplete processing of the questions.

Moreover, there’s a sinister fifth limitation related to rater-severity. Put simply, not everyone uses ratings or scoring systems in the same way — some people are harder or easier to please than others. Think back to your school report card and you will recall that, although working off the same grading system, certain teachers were lenient in their scoring while others were strict.

This type of respondent bias unfortunately also applies to consumer ratings like TripAdvisor reviews and NPS scores. Many marketers know this intuitively. For example, British consumers seem more difficult to please, so they give systematically lower (i.e., stricter) ratings on products or services. In contrast, American consumers are relatively easier to please, so they give systematically higher (i.e., more lenient) ratings on the same products or services. The net effect is that both consumer sets are using the same NPS scheme but in significantly different ways, thereby not producing a true apples-to-apples comparison.

This problem is extremely pervasive, according to Rense Lange, a Ph.D. statistician with the Polytechnic Institute of Management and Technology in Portugal and a published expert in psychometrics and quality control models. “Over 20 years of scientific research demonstrates that raters consistently differ in severity, even when using the same scale or review system.” So, how can organizations that rely on consumer ratings manage these severity biases so that their data-mining accurately guides business decisions?

Dr. Lange spoke to Aethos™ about two solutions. First, it is possible to change rater-severity with two to three days of structured training, but this is very difficult, time-consuming and totally infeasible for online reviews or NPS scores that depend on consumer interaction. Plus, there is the added complexity that when raters or reviews are trained to be meticulous and contemplative, they subsequently tend to start using the “safe” middle-rating categories. In other words, if you tell people they are “too easy” or “too hard” in their grading or reviewing, they will start avoiding their respective extremes by starting to use more “middle” categories. Consistent “middle-of-the-road” scores or ratings essentially dilute or hide the very positive or negative feedback that organizations need to know. So, it seems the Trickster strikes again.

There is a second and more technologically grounded solution, however, that levels or equates rating across different reviewers with different severity biases. And, as Dr. Lange noted, it has been proven effective in courtroom settings with judges’ attitudes toward leniency in rulings, with HR and manager performance appraisal reviews, and even consumer product reviews on It is an advanced statistical analysis known as the “multi-facet Rasch scaling” (MFRS) methods, which can be performed by qualified operators using the FACETS computer software by Michael Linacre.

The mathematics are too technically complex to explain here, but companies that care about accurate insights from data-mining–especially using information derived from customer ratings and reviews–are strongly encouraged to confront the Trickster in market metrics by never taking raw numbers like NPS scores at face value. The best practice is to leverage the power and accessibility of leading-edge software and statistical analyses to understand this type of data with nuance and intelligence. Most organizations do not even know the problem exists, and those that do likely will not take the time to actually fix it with MFRS. Those that do, however, will find they have a far greater handle on their data and, by avoiding the Trickster, are ultimately able to make better decisions. For those companies, the value of data will go up even further.