Geonormalization in Faraday
At Faraday, we automatically detect and correct for geographic bias in our training data through a process we call “geonormalization”.


At Faraday, we automatically detect and correct for geographic bias in our training data through a process we call “geonormalization”.
Why
It's probably easiest to explain through an example, so let's walk through one.
Company A has been growing steadily finding new customers in California since opening their business in San Francisco. However, they want to expand into new markets all over the Western US.
Because they are expanding their business, they don’t have many examples of customers in NV, OR, or WA. And there is a thorny issue of the potential customers in those three states are going to look very different from the pool of potential customers in CA.
If we deployed a naive model using those 4 states for the objective of new market expansion, the greedy nature of our propensity model would take a look at all the existing CA customers and the new pool of potential candidates and conclude that everyone from CA is more likely to become a customer than anyone from NV, OR, or WA. Not very helpful for expansion is it?
How
Hence the need for a correction factor, to ensure that we are able to find the most likely customers in all of the regions our client wants to expand into. Our approach requires a detection algorithm and a correction algorithm
Detection
We use a simple, but effective, algorithm. We compare the market penetration (customers / total possible customers) among each distinct geographic area at the desired level of analysis. If the largest market penetration is greater than 100x the smallest market penetration then we perform our correction algorithm during training.
For detections we start at the largest geographic area of interest, the state level, and work our way down to the smallest geographic area of interest, the postcode level. If at any point during the detection, the 100x threshold is met we perform the correction for that geographic level. Therefore, we are always applying the correction at the largest level possible.
Correction
We use a two-pronged approach to correction: instance weighting for the training examples, and bias correction for the training features.
Instance weighting
We assign a weight to the positive examples calculated from the penetration rate of those positive examples in their respective geographic area. This in essence levels the playing field and gives a boost to the positive examples from areas where there are relatively fewer examples (unsaturated markets) and gives a penalty to the positive examples from saturated markets.
From a modeling perspective, we are tricking the model to believe that the good examples are just as likely to exist in any of the geographic segments. This leads to less bias amongst the features used to distinguish good examples from bad examples.
Feature correction
Our other method we employ in tandem with the instance weighting, is to transform features so that we level the playing field across the geographic segments. Instead of providing the model with raw features, we transform them so each person is represented by the percentile they occupy within their segment. Again, this one is probably easiest to explain via an example.
Instead of providing raw household income values of $84k for CA resident A and $66k for NV resident B, we provide the model with how they rank amongst their peers within their respective states. I’ve chosen the median income for each state, so the model would receive 50 for resident A and 50 for resident B.
By correcting the features, we provide the model with fewer discriminatory features with which to determine that everyone from the more saturated geographic segment is a better fit than everyone from the less saturated segments.
Conclusion
Geographic bias correction, aka “geonormalization”, means that when using Faraday to expand your business into new markets, you can do so confidently knowing that our models will not just default to recommending people from markets in which you already have a significant presence. Grow your business confidently with predictions from Faraday.
Ready for easy AI?
Skip the ML struggle and focus on your downstream application. We have built-in sample data so you can get started without sharing yours.