Dirty Data! Improve Your Data Integrity One Step at a Time

“We Can’t Trust That Data”

You need only participate in a board or staff meeting once where sharing a report is met with a response of “That’s not accurate. We can’t trust that data” to know you never want to experience that again! What’s at fault? Dirty data.

Dirty data can be incomplete, incorrect, inaccurate, or duplicative. Maybe the data does not follow the standard conventions you’ve established for your CRM or it otherwise violates your business rules. Dirty data degrades trust in and casts aspersion on your analytics, your system, and by extension, the staff using it. Beyond loss in trust, dirty data can have significant financial implications if you are making decisions on inaccurate information.

Garbage in, garbage out.

If you’ve been managing a CRM for long, you’ve likely heard the maxim “garbage in, garbage out.” This is tongue in cheek; of course your data isn’t garbage! However, the truth is that the quality of information output is reliant on the quality of data input.

As a firm that places a priority on governance and analytics, nearly every project we’ve completed over the past 10 years has included some sort of clean-up to improve data integrity. Data integrity pertains to the accuracy and validity of data throughout its life cycle. If your data has integrity, it is considered largely accurate and valid. It can be trusted. Data lacking integrity, is, well, dirty. It can not be trusted.

Identifying Dirty Data

Not sure if you have dirty data? The bad news is. . . you probably do! Here are some of the usual suspects:

  • Misspellings, punctuation errors, and incomplete values, usually from human error
  • Phone numbers missing area codes or addresses missing zip codes
  • Zip code fields missing the leading zeroes
  • Inconsistent abbreviations
  • Addresses or phone numbers that are fake
  • Duplicate Accounts or Contacts
  • Leads or Opportunities that have not been updated with the accurate Stage
  • Opportunities without designations (GAUs), Soft Credits, or Contact Roles
  • Data that doesn’t match the field type (i.e., a phone number in an email field)
  • Data that is out of a range (e.g., 14 for a month number)
  • Combined Contact records (i.e. Mr. Fred and Ms. Betty Flintstone instead of one Contact record for each)
  • Data from integrations that are not consistent with manual data entry (e.g., using state abbreviations manually when an integration uses the full state name)
  • Data may have been correct at one time, but is now outdated (e.g., bounced email addresses)

7 Steps to Improve Data Integrity

Tackling your dirty data can be a daunting task. Unfortunately, there’s no magic button. However, you can break the work down into smaller steps. Here are our recommendations.

  • Create reports for analysis.

Use report filters to catch records that do not pass the test. You could, for example, create a report of Contacts where the email address does not contain .com or Opportunities where the Closed Date is in the past, but the Stage is Open. We refer to these reports as “Exception Reports” – they will show you the reports that are exceptions to your data standards. The goal will be to get the report to 0, or as close to 0 as possible.

While we’re on the topic of reports, reports and folders can get out of control! Did you know you can run a report on reports to help you with your report clean-up? So meta!

  • Take advantage of List Views.

Customize List Views with Created Date, Created By, or Last Modified By to narrow down your data sets. For example, this is helpful when you know data that was imported by a specific User is several years old or prone to data errors. If you want to be able to create a view you can edit, you can use in-line editing or a tool like Apsona for Salesforce in Tabular View.

  • Know when to switch to Sheets or Excel.

As much as we like to stay in Salesforce for data transformation, sometimes you will need to use Excel or Google Sheets. When you do, consider the following:

  1. Export your data. Always, always, always export your data first. You can use the native Salesforce data export tool or a third-party app like Apsona or Data Loader. Be sure to save your export file to a safe location.
  2. Make a copy of your export file. Make a copy of your export file and rename it with a prefix like “Working”. Save this copy to a safe location as well. This will become your working copy.
  3. Use Excel or Google Sheets to transform your data. Be sure to familiarize yourself with the native options for splitting text to columns, conditional formatting, and removing duplicates. Confirm your cells are formatted correctly. Learning a few Excel or Google Sheet formulas can save you a ton of time when it comes to transforming data. Some of our favorites include:
      • DATE
      • VLOOKUP
      • HLOOKUP
      • CONCATENATE

PRO TIP: Try a demo of an app like Enabler4Excel or G-Connector for Salesforce. These apps allow you to view and edit your Salesforce data from Excel or Google Sheets. Be warned: these are powerful apps!

  • Double-down on Duplicates.

Set up Duplicate Rules and Matching Rules in Salesforce or use a tool like RingLeadDemandTools, or Apsona DeDupe to set up matching criteria to reduce duplicates. You’ll do this object by object, which provides you with the opportunity to split the work into reasonable chunks of time.

  • Archive old data.

Together with your Center of Excellence team, determine a standard for how long you will keep records that are outdated. Examples of records you may wish to archive include old Campaign Member records or Activities. Salesforce Big Objects enable you to archive large data volumes that remain accessible via API.

  • Consider data validation tools.

There are a host of third-party data validation options available on the AppExchange, including Account, Lead, and Contact data verification, address validation, and email and phone verification. These are typically not free tools, but are worth considering before launching any direct mail, email, or phone/SMS Campaign.

  • Prevention is better than the cure.

The best way to combat dirty data is to prevent it in the first place. Set naming conventions to force standard naming of Opportunities, Campaigns, and other high-volume records. Turn on the State and Country Picklist options. Create field validation rules for fields where it’s critical that the data be as complete and accurate as possible. These are declarative tools that come with no extra cost and could save you tons of time and money once implemented. Not to mention the headache from looking at a spreadsheet for hours!

While we couldn’t detail all of the tips and tricks of data integrity here, we hope we’ve given you a head start. It may seem like a huge job, but if you break the work down into actionable steps as we’ve mentioned here, you’ll have clean data in no time. From there the key will be to set up systems to keep the data spic-and-span and use your exception reports as early warning indicators. We’ll share further tips on this in our next blog post!

RELATED CONTENT