How to Fuzzy Match Company Name

12 Feb 2015

I have been a B2B marketer for most part of my career. More often than not, there is need to match two lists based on company name. There are various reasons to have this matching, some of the obvious reasons are.

  • Named/Strategic
  • Lead Scoring
  • Lead Routing
  • Competitive Analysis
  • Opportunity and Pipeline Analysis
  • Marketing Influence and Marketing Contribution Analysis

As a marketing database is considered the data junkyard, where data flows from various channels such as purchased lists, in house lists, agency lists, partner lists, web collection, user input forms etc, the data quality is often very poor. Every source has its own standards and procedures. As a result, there is often misalignment and this can make it more difficult to get the information that you are actually looking for.

Unlike relational and standardized data, the Company Name is often a free text field. Users may write the same company name with various variations. Not only can a company name be spelled in many different ways, there can also be companies that go by completely different names and companies that include different abbreviations in their names, making them even more difficult to pin down. For example, WalMart can be spelt in many ways:

  • Wal-Mart
  • WalMart
  • WalMart Stores
  • Walmart.com
  • Walmart Inc
  • Walmart Stores Inc
  • Wal-mart Stores Inc.
  • Sam’s Club

Clearly, it is not possible to match these names using string match. I also suggest avoiding the Microsoft Excel’s VLOOKUP function as this generic function is not tuned for company name match. This process needs a little bit of more work from the individual users.

First things first, know the company name data you are trying to match, specially following (in the order of priority):
Company Country
Size of the company

The company’s country helps identify the company incorporation suffix (such as Inc, Corp etc for USA), while the size of the company helps flag different identities (such as subsidiaries, acquired companies). All of this information is important in your search, because so many businesses have names that are similar and you won’t be getting your best results if you don’t narrow them down in any way you can before you start searching in depth.

Here is the California Secretary of State Legal Business Name extensions
http://www.sos.ca.gov/admin/regulations/business/business-entity-names.htm#section-21001

There are many data vendors who provide the company size. Once the company size is known, and if the list contains large enterprises, then I recommend that you also procure the subsidiaries and acquired company list. Though it’s a pain to get the most up to date acquired company list as its very dynamic, you will thank yourself for the walking the extra mile.

After these two critical data points are collected, let’s get rolling. Apply these rules in the order, and flag the records once they are matched so that they are not run thru subsequent rules.

  • First, do a simple string match. Identify and flag whatever matches.
  • Now, clean the legal business entity suffix (for example, convert Wal-Mart Inc to Wal-Mart). These serve very little purpose while matching. Again, do a string match.
  • Now, clean the company name of any special characters and white spaces, and do a string match for a third time. (For example, convert Wal-Mart to Walmart)
  • If still more matches are desired, you may look for “in string” search (both ways). Example,”WalMart” instring “WalMart Stores”, also search “Walmart Stores” in “Walmart”. One of these will return True.
  • Now, if desired, use the acquired/subsidiaries company list to search with above 4 rules. This will match the “Sam’s Club” with Wal-Mart.

Above are the basic company name fuzzy match routines that should be done at the minimum to get something substantial. However, there many dimensions and various methods to perform company name fuzzy matches. Many of these include advanced computing (to analyze company name variations and user input), statistical analysis (to find the most probable match), manual resolution (custom match based on need), advance data collection (account variations, acronyms, regional names) etc. Fuzzy Match Company - a data analytics company (website:www.fuzzymatchcompanynames.com) offers company name match as self-serve online service. If you are short on time or resource, it’s worth exploring what they have to offer.

There are several other well known routes that the marketer can take in order to get the company name matches that they are looking for. However, these can take a bit more time and each of them has problems of their own, so it’s important to take this into consideration when choosing your best path. Let’s look at some of these techniques and how they stack up next to our original method.

One option to narrow down your requests a little bit more is to use SOUNDEX(). This method uses algorithms to determine if the names match by using the first group of characters in the words that you use. However, there are a few problems with this method, as well. It only recognizes the first couple of characters that you put in, so if there are longer names that are similar to until the end, then you might run into some issues. Also, the beginning letter of the names has to match or you won’t find the matches.

While SOUNDEX() has lost a lot of popularity among its users simply because of its errors, the switch to using Metaphone has proven to have a lot of the same problems. Metaphone is known for matching the names a little bit more clearly than SOUNDEX(), however it can still be quite problematic, although all of these methods can be used in conjunction with each other to further distinguish your matches.

A third option for your data matching needs is going to be the Levenshtein Function, of course. This is another popular choice for marketers to make use of, but it also comes with its problems as well. This technique determines the number of letters that would be needed in order for one company name to match another. This means that it works fairly well for names that are input into the database in only a slightly different matter, however it can return matched in error if the names are reversed.

All of these techniques will help you get all of the matches that you need from your databases. The problem with using any of the well known methods that are listed after my main method is that they often take much more work in your preparation; in addition to giving you too many false reports or not finding important matches. If you start off with the method that was outlined for you in the beginning then you are assured of making the most matches of company names and ensuring that you are putting your best foot forward for your clients. This can take a bit of work on your part, however, putting that little bit of extra effort in will make your job a lot easier and make it certainly more worth your time because without better results you can be missing out on a huge amount of revenue without ever knowing it.