Data has a story to tell. If only we allowed ourselves to listen.
This article is a collaboration with David Gossett, Principal with Infornautics, who builds first mover technologies that have no instruction set and need to be invented from scratch. He believes data has a story to tell if we apply the right machine models. His specialty is unstructured data. This article is intended to be provocative, to summon curiosity into the issues that plague us today when it comes to machine learning. We are anomaly hunting.
Three years ago, I wrote this article, Artificial Intelligence Needs to Reset. The AI Hype that was supposed to transpire into all-things automated is still far off. Since that time, we’ve experienced speed bumps that have pointed to issues including lack of model accountability (black boxes), bias, lack of data representation in the training set etc. An AI Ethics movement emerged to demand more responsible tech, increased model transparency and verifiable models that do what they’re supposed to do without impairment or harm to individuals or groups, in the process.
Our future is Artificial Intelligence. It’s been conjectured that this wonderful AI will be our savior. We are constantly in a state of information overload. As we generate petabytes and petabytes of data every second of the day, we do not have the human capacity to make sense of this deluge of information. We have come to increasingly rely on machines to do this for us. And therein lies the rub. It’s not working. Not really.
From Deductive to Inductive Models
We have moved from a time where humans have created the rules to a time where decisions are now largely governed by data.
Rules that have become the basis of how we live day to day and have, overtime, have become established norms, or practices that have been refined as we learn and evolve. To prevent car accidents and pedestrians from being hit by cars in highly-trafficked intersections, street lights have directed the exact time when a person is permitted to cross and when a vehicle may proceed. This is an example of Deductive Reasoning. There may be a first premise or two and finally a conclusion based on the evidence. This rules-based system is very human-centric. X happens? Then do Y.
Premise 1: There are high incidents of car accidents in highly-trafficked intersections.
Premise 2: There are high incidents of people getting hurt or killed in highly-trafficked intersections.
Rule: Install traffic lights in highly-trafficked intersections.
In the advent of Big Data, we have shifted from Deductive Reasoning to now Inductive Reasoning. Inductive Reasoning comes from copious observations with the goal to looking for patterns to make an overall generalization. The creation of models from data is known as a model induction. These general rules are statistical, therefore do not hold 100% of the time. These observations in the data continue “until we get closer and closer to the truth, which we can approach but not ascertain with complete certainty.” Applying Induction Reasoning to the two scenarios would look like this :
Data: In Toronto and New York, there are high incidences of traffic accidents.
Hypothesis: Large cities tend to have high incidences of traffic accidents.
Rule: Where there are at least 1000 cars and more than 500 pedestrians in urban intersections between 9 am -5 pm install traffic lights.
Today, humans write computer programs. The customer tells the programmer what functionality is needed and the programme then designs and builds the application.
Tomorrow, as AI becomes the norm, the computer will write the program for us. The customer will provide the functional requirements and the data to the computer and it will write the application without any human intervention.
Future decisions will be driven by inductive models, more probabilistic-based with observations that will ultimately influence the new rules and decisions that are made. This, as we’ll see later, poses a dilemma.
Curve-Fitting And the Consequences of P-Hacking
When researchers create models, standard operating procedure is to create a line of best fit. This simply means that given a set of observations(or data) we need to determine a single function or mathematical relationship among that data. This mathematical relationship can be used to predict outcomes based on the input relationship to the outcomes from the original data. For example, researchers will use various non-linear regression model approaches to fit the data. Once the data is inputted in the model, it generates a score – the higher the score, the better the model fit.
In the example below, the resulting graph shows just an ‘OK’ fit. Assuming a perfect score of 1000, the best model returns a score of 744. Given the number of models used to curve-fit the data, the results show that observations that tend to be ‘alone’ or ‘outside’ from where most observations tend to cluster – are outliers. These are represented by the red arrows. These are ANOMALIES. They are the protagonists of this article. The overlooked. The omitted.
WHY SHOULD WE CARE ABOUT ANOMALIES?
In most models there are outliers. Some of those outliers may be considered anomalies. Anomalies are ‘different’ or ‘abnormal’. Because they do not follow a common trend, many researchers will tend to dismiss them as noise. Researchers will try various regression models until they’ve gotten a great score or the best fit possible.
I was hiking with one researcher and when I told him, “You shouldn’t be curve fitting, you know! You’re just hacking the data into a straight line.” The guy didn’t talk to me for another hour on the hike. That’s how offended he was. He justified that curve fitting will remove the noise.
This paper, The Extent and Consequences of P-hacking, defined it this way:
“One type of bias, known as ‘p-hacking’ occurs when researchers collect, select data or statistical analyses until nonsignificant results become significant.”
The paper concluded that in the scientific community there are great incentives to publish statistically significant results. Employers, funders and reviewers count a journal’s impact to assess a researcher’s performance.
In an 8-year Cancer research study, there was an attempt to reproduce results from 193 experiments gleaned from 53 top Cancer Research papers. What they concluded was that less than 25% of experiments were not reproducible. It was noted that authors for ⅓ of the experiments did not reply to requests for more information. Others exhibited hostility towards others who wanted to replicate their work. This article validated similar conclusions from the P-hacking paper:
“replication may also feel intimidating because scientists’ livelihoods and even identities are often so deeply rooted in their findings…“Publication is the currency of advancement, a key reward that turns into chances for funding, chances for a job and chances for keeping that job… Replication doesn’t fit neatly into that rewards system.”
In our alternative universe, instead of curve fitting, we have questioned why we would want to limit ourselves? Instead of p-hacking the data, why don’t we, instead, analyze the anomalies?
Consider that most of the airplanes these days are piloted by a computer programmer. Subject matter experts and engineers collaborate and write code for every possible scenario a plane may encounter. But, sometimes, not all scenarios have been considered.
While airlines have long used automation safely to improve efficiency and reduce pilot workload, several recent accidents, including the July 2013 crash of Asiana Airlines Flight 214, have shown that pilots who typically fly with automation can make errors when confronted with an unexpected event or transitioning to manual flying.” ~ Inspector General in a letter to the FAA
In 2018, Air France plane, en route from Brazil to Paris, crashed into the Atlantic Ocean after the auto‐pilot malfunctioned and crew error caused the plane to stall. All 228 aboard died. The investigation found “external speed sensors had been frozen and produced irregular readings, and the aircraft sent into an aerodynamic stall”.
In 2014, an AirAsia plane crashed into the Java Sea after the auto‐pilot kicked off in bad weather and the pilot’s bad decision put the plane into a stall that led to 162 deaths.
What was common among these tragic airline events? Programmers did not write a line of code for these scenarios. Also, the pilots are becoming shockingly illiterate in the cockpit. When a computer code is missing a line and control of the aircraft must be handed off to the pilot, the pilot is completely out of practice.
Nassim Nicholas Taleb, Author of the Black Swan said this:
“A life saved is a statistic; a person hurt is an anecdote. Statistics are invisible; anecdotes are salient.”
Americans will remember 9/11 and the 2977 deaths because of what we all witnessed that day on national television; however, when we compare this event to the number of war heroes who made it home following WWII where it was estimated some 75MM died – is less poignant. It’s these salient incidents that will drive tremendous change as evidenced by the geo-political events post 9/11.
But had these anomalies been paid enough attention, could these crashes have been prevented?
Recently, an air passenger, with no experience flying, was able to safely land the plane when the pilot fainted. The passenger displayed the skills equivalent to a student completing his first solo flight. Was this incident really that rare? Should this be analyzed to determine ways to make it possible for passengers without flying experience to safely land planes in such emergencies?
WHY IS NO ONE PAYING ATTENTION TO THE ANOMALIES?
Anomalies have existed from the beginning of time. The unpredicted, unplanned and yet, ‘hugely’ consequential: Enron Scandal that shook Wall Street. Bre-X $6 Billion Mining Fraud. War (WWI, WWII, Viet Nam) are full of “unknown unknowns”. Do we realize the long term effects of the decisions we make today? We recognize the government-granted Covid-relief benefits that benefitted millions adversely affected. But will our children begin to feel the downstream effects as they pay the surge of higher taxes in the next decade?
The difference today is that with the invention of the internet, we are now overwhelmed with petabytes and petabytes of information. The sheer volume of data has made humans rely more and more on algorithms to make sense of it all. In this data there is more knowledge, more nuance, more consequence. But, we tend to drift towards those events that are more likely to occur, the inconsequential. Nassim Nicholas Taleb, Author of The Black Swan and Antifragile said this:
“I find it scandalous that in spite of the empirical record we continue to project into the future as if we’re good at it, using tools and methods that exclude rare events. Prediction is firmly institutionalized in our world.”
We did a simple search on tech sites like IBM, Microsoft, Cisco and consulting and sector-specific companies to see how many times the word “artificial intelligence” appeared vs. “anomaly’ or ‘anomaly detection’.
Directionally, what we found was that consistently most sites show significantly higher machine learning and artificial intelligence search results, while anomaly/detection paled in comparison.
However, when we compared key word search results for “anomaly detection” what stood out was that tech companies (IBM, Microsoft, Cisco, Intel, Oracle) understood that anomalies do matter. Consulting companies have less mention of “anomaly” and virtually little mention of this among the insurance, finance and industry companies. Given the examples above of AirAsia and Air France’s crashes, it stands to reason that “anomaly detection” should have more mention in the airline sector.
THE TRICKLE DOWN EFFECT
Typically, what drives results are incidences we can account for – ones that are statistically significant, and drive a high probability of success.
The C-Suite drives the objectives. Working for a large publishing platform, our key objectives were Reach, Revenue and Engagement. These imperatives are trickled down to employees and they were measured against these objectives. Their jobs, their bonuses rely on their individual performance against each of these initiatives.
I knew someone who worked as the Privacy Lead for a big tech company, reporting directly to the CEO. Her job was to ensure that their users were easily and effectively able to find and navigate their privacy settings. Her job performance was measured by user satisfaction when it came to their privacy. She made sure she collaborated with the engineers who managed the relevant pages to ensure she met her goal. However, the engineers were incentivized to ensure users were engaged on the platform. Some of the page recommendations directed by the privacy lead would hinder his ability to adequately meet his goal. Since the imperatives imposed by the C-Suite (remember, engagement was a key imperative) were more closely aligned to the engineer’s goals, guess who ended up meeting their goal?
Typically, objectives dictated by the C-suite, will generate outcomes that are self-fulfilling. What’s clear is that employees will only deliver the tasks (driven by the C-Suite) that will yield expected outcomes. If there are anomalies, they are thrown away and characterized as unlikely-to-happen-again events, or simply, noise. They don’t pay attention to them.
Here’s the problem: The employees who are using AI are giving the C-suite the information they asked for, for which their bonuses depend and outcomes that are aligned with the overall objectives. These outcomes will offer no surprises because the data will regress to the mean.
AI DOES NOT BELONG IN THE C-SUITE
This trickle down effect from the C-suite also surfaces another effect – that we’ve become so focused on our specific jobs and tasks, that we have been unable to see the big picture. People have become accustomed to being the cogs in a wheel, that when they get promoted to the wheelhouse, they cannot adequately perform.
Ian McGilchrest wrote “The Master and His Emissary”. He studied the relationship between our two brain-hemispheres as a “crucial shaping factor in our culture.” He questioned the dominance of the left brain, which is focused on details and specific issues, while the right side sees a much broader view, and looks at many data points to understand what is happening. While both co-exist and can depend on each other, the right side can grasp “metaphors, jokes or unspoken implications”, of which the “left’ is decidedly autistic”.
Here’s an interesting analogy: A bird is looking for food in a park. It needs to focus and discern the difference between seeds and pebbles. This focused left-brain activity will allow the bird to find its next meal. However, at the same time, the bird needs to be cognizant of any predator that may be in the area. It needs to use its right-brain to scan the environment to survive. Notice that both events: the search for food, and awareness of potential predators are both crucial activities for survival. McGilchrest argues that humans have long depended on our left brains that we’ve never built the capacity to effectively make decisions using our right brains, especially when we are in a position of power.
So the C-suite began relying on data, and required employees to feed them the information they needed to effectively make decisions. And because of the trickle-down incentive structure coupled with this left-brain thinking that has dismissed these anomalies, decision-makers will never have all the information required to give them a holistic view of the situation.
By missing the forest for the trees, the C-suite misses the larger implication: the anomaly may point either to the huge risk that may result in ruin, or an outsized market opportunity they can capitalize on.
In 2021, US safety regulators started investigating Tesla’s use of Autopilot after 11 crashes that killed an individual and injured 17 people. Musk insisted the Autopilot system was not flawed. Reports suggest that Musk dismissed an idea that their driver-assistance program should be monitoring drivers, insisting that any human intervention could “make such systems less safe”.
The irony of it all: Because we’ve institutionalized that which could be predicted, we’ve also institutionalized machine learning models to forget anomalies – these false positives, false negatives that are least likely to occur. We’ve turned our backs on these things and that’s why we’re constantly surprised. We’re surprised by the 2008 Financial Crisis. We’re surprised by 9/11. We’re surprised by the Ukraine Invasion. We’re surprised by Covid-19. We’re surprised when Elon Musk wants to buy Twitter.
Should we stop to consider that if the C-Suite were informed of these anomalies that some of these consequential events could have been prevented? In hindsight had we considered paying attention to these outliers could we have additional insight that would have altered the outcome?
Anomalies don’t fit into existing systems. However, they point to new knowledge and the potential to deepen and extend existing theories – the untapped potential, or identified risk.
Daniel Kahneman: “We are prone to overestimate how much we understand about the world and to underestimate the role of chance in events”
Daniel Kahneman’s “Thinking Fast and Slow” introduces his System 1 and System 2 thinking positing how humans make decisions. This may explain why we are drawn to normal distribution? It’s a System 1 approach that is “baked in”, instinctive and unconscious that has been instituted into mindsets and processes. This site noted the defaults towards normal distribution: “It’s easy for mathematical statisticians to work with them. Almost all statistical tests can be derived for normal distributions.”
Let’s examine models and the role of the distribution curve. The shape of the curve reveals where more of the data is lying.
The middle image below (Symmetrical Distribution) the data distribution is equal in proportion to the central tendency. For example, in grocery store x, through their data, people will consistently come in every week to buy basic needs: bread, milk, and eggs. This behavior, which is highly predictable, will tend to sit under the highest point in the curve. Under Normal or Symmetrical Distribution, 65% of this behavior sits within 1 standard deviation to the left and right of the middle. For data scientists, that’s a good thing.
But this is rarely how society works. That’s not how data science works.
When the summer comes, people will buy more watermelons. This is really the only time period when it’s available so this will create a Positive Skew. And the larger volume of watermelons that people will buy will begin to move the mean (or average) to the right. In a perfect world, that mean (or average) would be at the peak of the curve. But as we load more of these watermelon purchases into the model, this extreme positive skewness is not desirable for distribution.
What we don’t realize is that there are many Positive Skewness examples that we see today: income levels; housing prices; seasonal purchases; Etsy’s hand-crafted and vintage goods; premium hair products and so on… The more choice we are given, the more incidences of positive skewed data. The average (mean) will now be greater than the median value and even higher than the mode.
Side note: Negative Skewness, where the average value is less than the median or mode, is very rare. One example of this: number of fingers. Most people will have 10 total but may lose one or more in accidents.
So now, we have to turn to data transformation tools to help make the skewed data closer to our normal distribution curve. So as soon as we do this, we can now easily make our machine models easy to work with.
Here’s the crux of the argument: Companies manage decisions around the mean. According to Kahneman, “We underestimate the role of chance in events”. Tesla dismissed a suggestion that their systems should be taking into account human monitoring. Could this have avoided the 11 anomalous crashes? Plane safety standards and rules only take into account when the pilot is adept/able to fly under ideal conditions. Should they account for anomalous conditions when the pilot is incapacitated?
When we consider anomalies, we have noted these are rare occurrences but they raise suspicions by “differing significantly from most data”. Their outcomes, as evidenced, may be consequential.
KURTOSIS: SELF-FULFILLING PROPHECY?
So as we default to Normal or Symmetrical Distribution, in the era of AI, machines are rewarded to get it right. They’re rewarded not to be deviant. Our probabilistic tendencies are to get as close to the mean as possible because it’s the safest bet of being rewarded. So as we try to influence the data towards the mean, it begins to stack the curve and it gets higher – this effect is called Positive Kurtosis. (see image below). Remember in a normal distribution curve, 65% of the data sits within 1 standard deviation from the center. But the more we raise for kurtosis, the curve becomes thinner, and we start to see 75%, 85%, 90% of the data suddenly sitting within 1 standard deviation from the mean. At this point the model gets better and better at guessing this middle.
The more we raise Kurtosis, the more we pay attention to what’s happening in the middle. Hence the more surprised we become when anomalies like that one stock that yielded a much lower than expected return actually occur.
Kurtosis is important because of what it creates: fatter and fatter tails that are increasing in frequency and impact i.e the anomalies, while still outliers, have far higher incidences compared to the thinner, normal distribution curve.
Kurtosis can be used to measure financial risk. A large Kurtosis is associated with a high level of risk, which indicates higher probabilities of extremely large and extreme small returns.
So when we apply this to these rare world events: The Covid Pandemic gave rise to the COVID vaccine and more government rules on masks and mobility, and have created these fat tail side effects that have emboldened anti-vaxxers and free-speech activists.
When we fatten the tails, we have higher peaks, smaller shoulders, and higher incidences of very large deviations. ~ Nassim Taleb
ML and Artificial Intelligence love regression. In the Kurtosis example, the purple dot represents the reward. The gray dots outside of the curve are not rewarded. Before Covid-19 there were always fringe groups, religious or otherwise, that did not agree with government mandated vaccinations. But Pandemic exacerbated this movement which gained more momentum, in size and frequency and evolved into a freedom movement–something that otherwise would not have been anticipated.
By failing to pay attention to this anomaly, the events previously unseen can grow in size in scale and scope, materialize globally and become uncontrollable.
AI will Only Serve Us if We Consider ALL the Data
If we continue to default to Normal Distribution, and are incentivized to do tasks that report squarely on the objectives of the organizations, and dismiss the values that are least likely to occur, we will be failing the C-Suite.
In the process, we turned a blind eye to potential opportunities for innovation and competitive advantage. We lose sight of the dangers lurking in our midst that may have far-sweeping implications to the business.
It was important to detail of what has been become the norm as we venture increasingly into machine learning and how we analyze information. In the young lifespan of artificial intelligence, still trying to find its way into the mainstay organization, it has managed to create a path, that has been deemed faulty from the start. We are experimenting, creating process, and policy with the wrong incentive structures that perpetuate these biases and recurring data issues. We are missing the forest for the trees because we have become complacent in dismissing outcomes we are convinced will not happen again.
Until we think outside of the industry accepted norms we will continue down a path towards our own eventual defeat.
In our next post, we’ll offer an alternative method – a consensus approach that makes us more aware and less surprised.
About David Gossett
David Gossett, Principle at Infornautics, believes anomalies are currently being ignored by both humans and artificial intelligence alike. David uses advanced models to identify patterns in the outliers, which he believes represents all risk and opportunity for a company. His specialty is unstructured data and previously taught a computer to read resumes and decide which candidates should be interviewed for each position. He cut his teeth in Big Four accounting, building a sales force automation system that managed $750MM in new revenue. He also spent time at Enron building trading desk fundamentals and arbitrage tools.