There are more retail channels available to consumers than ever before. From traditional in-store shopping to web, mobile and social media apps to hybrid shopping models (e.g., buy online, pick up in-store), retail’s reliance on technology has grown, expedited by the digital transformation of operations during the COVID-19 pandemic.
Of course, technological innovations come with increased customer expectations. Guarantees such as same-day and time-window delivery or same-day pickup have raised the stakes, with customers expecting a seamless retail experience. All this builds pressure on retail supply chains to be 100 percent on point, driving increased reliance on advanced technologies such as edge computing, hybrid and multicloud infrastructures, microservice architectures and containers, and connectivity across a myriad of Internet of Things devices.
Success in retail is now premised on leveraging data intelligence efficiently, effectively and instantaneously to operational advantage. Leading retailers have increased investment in IT to build innovative and agile systems that better optimize IT costs and operations. However, this has added significant complexity to integrated and interdependent digital systems, increasing the possibility of failure and faulty behavior. Operations often remain siloed, reducing overall visibility into business systems and making it more challenging to answer questions and extract information to resolve issues proactively and quickly.
In the highly competitive retail industry, the cost of any system downtime can be disastrous, often running into hundreds of thousands of dollars per hour. Every issue and failure can negatively impact the customer experience, resulting in irreversible losses for the business, which is why simplifying triaging and pre-emptive resolution processes with the right tools is now a strategic priority for retailers.
Data Observability for Retail
Effectively harnessing and leveraging the power of data can be a key differentiator for retailers. It’s here that data observability provides an invaluable tool for IT engineers to quickly identify and fix issues, minimize downtime, and ensure business continuity.
Observability deduces the current internal state of an IT system and processes by analyzing the outputs — e.g., logs, metrics, traces, and data files — produced. Running in real time, observability provides deep visibility into, and greater control of, complex distributed IT systems — e.g., what’s going on, where, and why — supporting alignment between operational-level agreement and service-level agreement.
Deployed effectively, observability delivers tangible business benefits and superior customer experience. Proactive automated troubleshooting means IT professionals spend less time triaging issues and more time fixing them. System downtime is reduced, improving revenue, optimizing customer experience, and preserving brand value.
While observability is critical in helping IT teams find fixes to issues quickly, achieving it at scale has proved challenging for retailers for several reasons. IT failures can span multiple systems, so using disparate observability tools delays resolution as problems extend beyond the system being observed. Rapid digital transformation often results in a complex retail technology stack integrated to each other with hundreds of vendors, further adding to possible failure points. Also, high volumes of historically siloed observability data make triaging resemble hunting for the proverbial needle in a haystack.
IT alert fatigue can also be a challenge as "native" observability tools produce alerts at pre-defined intervals. Unable to prioritize alerts according to their severity, operations teams can waste hours scanning for signals in the noise. Finally, a lack of internal support and alignment between stakeholders and business leaders can make achieving observability outcomes challenging.
Achieving Observability at Scale
The value of observability is clear, but what do retailers need to successfully scale observability across their entire IT ecosystem to solve operational challenges?
The observability solution must obviously be the right fit, able to extract technological complexities to help engineers identify, triage and fix issues quickly, at scale. Therefore, the first step is to have an agreed upon, coordinated implementation strategy with buy-in from all stakeholders. It must be platform-agnostic, easily integrated, and seamlessly configurable right out of the box with multiple cloud infrastructure elements, all data sources, and enterprise technology systems such as SAP, Salesforce or Oracle. Observability must understand how business processes flow in physical and digital stores and allow configuration for the retailer’s functions and processes to achieve observability at scale from day one.
Retail business footprints are complex, spanning thousands of stores, warehouses, 3PL hubs and other facilities. Delivering excessive data to engineers without any visualization to make sense of it means digging through billions of data points in multiple logs, dramatically slowing resolution. Within a retail system, every transaction passes through multiple systems before completion. Visualizing the flow of those transactions becomes central to effectively triaging issues or identifying transaction failure trends. An observability solution should therefore be able to "blueprint" transactions to help engineers infer how their current status influences the behavior of digital services. Effective visualization makes observability more accessible, expediting issue remediation.
Deploying a Software-as-a-Service (SaaS) observability tool that provides a single source of unified visibility, with data represented through comprehensive dashboards, indicating the status of enterprise systems and technologies across facilities in real time across the entire IT ecosystem, is critical to success.
Finally, instead of hunting for that needle in the haystack, any modern observability solution should suppress noise, using artificial intelligence-powered triaging to automatically identify system problems and root causes in real time. Focusing on the right problem enables the prioritization of business-critical issues, ideally resolving most issues without requiring human attention. This is possible through deep reinforcement learning, capturing the system state and identifying the next-best action to restore the optimal state. However, humans should be kept in the loop as required. For example, if auto-remediation is unsuccessful, the systems should infer anomalous behavior and elevate that for human interaction and resolution.
The Key Takeaway
Retail is transforming rapidly, with businesses seeking to adopt new technology and innovate their services quickly to maintain competitive advantage. With growing complexity in the retail technology landscape, and given the serious consequences of failure, complete, agnostic observability is the key to powering frictionless operations and must be a top focus in the digital transformation strategy of a retail business.
Rajiv Nayan is vice president and general manager at Digitate, a provider of SaaS-based enterprise software for IT and business operations.
Related story: How to Establish a Resilient E-Commerce Strategy
Rajiv Nayan is responsible for growing Digitate’s ignio business in the Americas and is based out of Digitate Head Quarter in Santa Clara, California. Rajiv has an accomplished career of more than two decades in the IT services industry, across sales and delivery functions. Rajiv has a degree in mechanical engineering from Delhi Technical University and an MBA from the Rotman School of Management at University of Toronto.