It’s a tale as old as time: You, a customer, enter a store (digital or otherwise), just looking to make a purchase and move on with your life. But you may leave with a little more than you bargained for: Your purchase of bananas, NFTs, or a limited-edition furby … and having shared your personal information — which could be dangerous in the wrong hands.
As shoppers, we’ve grown so accustomed to giving out our names, emails, phone numbers, and sometimes even birthdays to the cashiers ringing us up for purchases that shouldn’t require that kind of data. With the prevalence of store rewards programs and store credit cards, we almost don’t think about the amount of personally identifiable information (PII) being collected and used in ways we don’t consent to.
Since protecting customer PII can be a complex task, one key area they can take the next step is by ending the use of real data in software development production environments and replacing it with realistic synthetic data.
The Pangs of Retail Software Development
When you think of a new reward program rolling out in a store, there’s an underlying process that has to occur first. The database that will ultimately hold customer information and the software used to gather this data all underwent extensive development and testing. Software developers need to ensure the database crawls the information properly and queries the right customer information when, say, a phone number is entered at a point-of-sale system.
To effectively test these new software processes, developers must run data through them. Without the use of synthetic data, that means real customer names, phone numbers, emails, and other information are essentially being used to test the efficacy of the program. Every developer working on that software is granted access to unlimited amounts of sensitive data. And should a breach expose that information to a third party? Then every individual in that database is at risk of some form of identity theft.
In other words, it can be a costly and damaging error for retailers. According to the 2021 Cost of a Data Breach Report compiled by Security Intelligence, the average cost of a breach is upwards of $3.27 million.
Synthetic data removes the risk to PII without hindering the developer’s ability to test the program properly before launching the software live. This data is exactly what it sounds like — it mimics customer data, replacing sensitive information with artificially created names, phone numbers, and any other fields that need to be filled in the database. With this fake but realistic data, developers can properly and safely test the very software that’s intended to be rolled out at a store level. The more tests they can complete, the more likely developers are to ensure the program works before rolling it out into a live environment.
Why is Synthetic Data So Important?
The primary concern isn’t so much that developers are going to steal customer data and use it nefariously. In some cases, those developers will already have access to real information. However, implementing synthetic data can allow retailers to work more efficiently and securely with vendors to optimize promotions.
When prepping new promotions in a collaboration with other vendors or retailers, the exchange of data may be crucial in orchestrating the back-end mechanics of the sale. However, privacy legislation all over the world may prevent a retailer from sharing the customer information needed to test the promotion. In Europe, GDPR dictates that data cannot be used for reasons other than what it was collected for (to render services or products to the consumer). So data subject to GDPR and CCPA cannot legally be used in software development and testing. Not to mention that ethically, developers simply shouldn't be able to see personal data.
Putting Customer Privacy First
There are several reasons why a retailer may want to start using synthetic data, but none are more important than the protection of customer information. As retailers become more reliant on databases and the storing of customer PII for rewards programs and promotions, they need to seriously consider using fake data as a means of putting their customers and their privacy first.
Omed Habib is the vice president of marketing at Tonic.ai, a San Francisco-based company pioneering data mimicking and de-identification.
Related story: Rewarding Customers Based on Digital Identity Trust
Omed Habib is the vice president of marketing at Tonic.ai. A developer at heart, Omed fell in love with fake data as a way to improve developer productivity. He formerly led Product Marketing teams at AppDynamics, Harness.io, and helped launch startups from inception to unicorns. When not faking data, Omed keeps busy geeking out on all things tech, photography, and cooking.