• Blog

Thinking Like an Auditor

  • Analytics – Aggregating Spend by Matching Records that Don’t Match

    May 1, 2018 | By Lyle Jacon, BBA, MBA

    My job is to talk to auditors about using data analytics software every day. Many are working on their audit analytics programs and most have issues with data preparation before even conducting audit analytics.

    The most common issue is non-matching data for entries that are referring to the same entity or person. The three most common non-matching fields are company names, individual names, and addresses. This blog covers company names – we’ll save individual names and addresses for future blogs.

    The Issue

    We’ll use the need to aggregate spend at year end as our example. Suppose you need to conduct the aggregation either as part of a routine audit or to check payments against a contractor to help in negotiations. If you can’t aggregate spend accurately, you could miss an opportunity for savings, fraudulent activity, or a control violation.

    The problem is that the data is from a variety of different systems, older legacy systems, different divisions, within different payment types, or from departments with various data entry standards. The differences create inconsistency and increase the likelihood of errors.

    Real-World Example

    During a rebuilding phase, one company experienced “Wild Wild West” spending spree, as they described it. They had multiple contracts, similar contracts for single vendors, payments without contract numbers, and missing vendor numbers on purchase transactions. As a result, the auditor had to spend days aggregating the spending for a single vendor. The data was then submitted to their data analytics specialist who would generate custom code to gather and normalize the data. Each time there was a change or a new vendor identified, the code had to be modified. The auditors wanted to know the details, and they wanted to know it right away. Having to hand off the data and rely on the data analytics specialist’s busy schedule was an inefficient, time-consuming, and frustrating project for the auditors and the data analytics specialist. As a result, it took weeks to identify that policies were being circumvented and spending was outside of contract limits, controls, and individual approval limits.


    Summarize Data and Identify Concerns

    How do you get a handle on hundreds of vendors to aggregate spend? Using a transactional dataset, as an example, your first effort will be to summarize the data by your field of concern with helpful secondary data if available. If your concern is a company name, then summarize the data by company and sort it alphabetically. If there are secondary data fields available, such as address or city in the transaction data, you would use that as your second level of summarization. Be aware that although it may be the same company, you may be purchasing from different outlets/locations causing the address to be different for the same vendor. If you’re also working with additional datasets, such as a transaction listing from a different place or a master vendor listing, you should run the same summarization process. A quick scan through the data, depending on the size, should provide you with an indication of any near-duplicates that are in the dataset. Once identified, you can do further research to verify that they are the same company. When reviewing the various data sources, you may find the following for a single entity.

    • Fictional Supplies Incorporated
    • Fictional Supplies Inc.
    • Fictional Supplies
    • Fictional Supplies Inc
    • The Solution - Data Normalization

    Through your summarization efforts and working frequently with your data, you may identify errors that happen regularly. In our example of business entities, a few you will watch for are incorporated, company, and limited liability corporation.

    You’ll use these data variances to set your rules and to complete data normalization. You can build this into a standard process that you perform every time you receive data.

    To implement this approach:

    1. Identify fields you would like to normalize (company name in this example)
    2. Identify the standard outcome you are looking to show. For example, do you want all companies that are incorporated to show up as “Inc.” or “Incorporated”?
    3. Design your process/procedure to normalize your data fields. For example, in Excel, if you wanted to change every “Inc.” and “Inc” to “Incorporated,” you would first need to search and replace each “Inc.,” then do the same for “Inc.”

    This preparation provides you with a standardized data format. The concern is that you’ll have to do this for a variety of instances – LLC, company, etc. You’ll need to design and run each of these different “search and replace” efforts on your data, and this could still take hours and increases the likelihood of mistakes.

    The Plan

    Normalization of data can be applied to just about any dataset and should be used regularly when the need for summarization is required in an audit.

    For audits that are irregular or new, build general guidelines for normalization, which would include:

    • Set normalization process for global terminologies like “Inc.”, “Inc”, or “Incorporated”
    • Develop best practices to set priorities for identifying additional potential matches and the process for summarization (possibly single sales volume, sampling, or not being able to match a vendor to a vendor number)

    For audits that are regularly completed or repeated, build the process over time and identify normalization requirements. This would include:

    • Global terminologies like “Inc.,” “Inc,” or “Incorporated”
    • Specific terminologies (knowing that one branch entered “Fictional Supplies, Inc.” as “Fiction Supplies, Inc.”)
    • Develop best practices to set priorities for identifying additional potential matches and the process for summarization (possibly single sales volume, sampling, or not being able to match a vendor to a vendor number)
    • Provide additional recommendation guidance to business units on data entry standards and best practices


    In the end, cleaning up mismatched company names will give you a better sense of the total spend per Company. You’ll be in a good position to now take a deep dive into each vendor to look for policy violations and possible fraudulent transactions. Check back again soon as we tackle mismatched names and addresses!

    Register for a Live Webinar
  • View Demo
    Contact Us
    Request More Information