Moving from Chaos to Confidence, One Incident at a Time
Moving from Chaos to Confidence, One Incident at a Time
As any developer will tell you, no matter how robust and resilient the technology, incidents are an unfortunate reality. Router crashes, service disruptions, or even an unexpected influx of users on a system - like many businesses saw with COVID-19 - can cause an application to shut down.
RBC is Canada’s largest bank and a top ten global bank by market capitalization, so ensuring that our systems can adapt and function through technical challenges is a top priority. Applications like NOMI or MyAdvisor involve complex distributed systems, so often traditional quality assurance isn’t enough to prepare for these scenarios.
“In our environment it’s critical to be able to identify and resolve issues impacting our clients as quickly as possible,” says Dan Clark, Senior Director, DevOps & API. “We needed to understand what happens to our clients when things go wrong, practice our incident response and find areas for improvements across the whole stack.”
To strengthen and solidify its Incident Management practices, RBC recently took the practice of chaos engineering to the next level in a collaborative and interactive, cross-team competition called Game Day. Game Day deliberately created incidents throughout the development and test cycles of various applications, allowing both the platform and application teams to practice coordinated incident response and resolution in real time.
Including both the Technology & Operations and Personal & Commercial Banking teams at RBC, the event allowed for experimentation and learning about how applications and platforms respond to different crisis events. Teams leveraged AI Ops systems to monitor incidents, improved the resiliency of solutions and average recovery time, and addressed any potential vulnerabilities before they became a reality.
“Helping our clients to manage their finances with ease and precision is essential – and we cannot afford to get it wrong,” says Ranji Narine, SVP, Cloud & Transformation. “Game Day was an incredible team effort and resulted in new insights that will change how we develop applications and resolve incidents moving forward.”
RBC is working to bring the Game Day results and framework to the rest of the organization and is identifying new applications to include in the next set of scenarios.
“The goal is to run these scenarios regularly, and add more tests based on learnings from past incidents,” says John Keenleyside, Director, Cloud Engineering & Principal Technologist and Game Day project lead. “Our teams are ready to learn, adapt and create to keep pace with the ever-changing needs of our customers.”
Learn more about our Technology & Operations team and other exciting initiatives we are working on.
RBC is Canada’s largest bank and a top ten global bank by market capitalization, so ensuring that our systems can adapt and function through technical challenges is a top priority. Applications like NOMI or MyAdvisor involve complex distributed systems, so often traditional quality assurance isn’t enough to prepare for these scenarios.
“In our environment it’s critical to be able to identify and resolve issues impacting our clients as quickly as possible,” says Dan Clark, Senior Director, DevOps & API. “We needed to understand what happens to our clients when things go wrong, practice our incident response and find areas for improvements across the whole stack.”
To strengthen and solidify its Incident Management practices, RBC recently took the practice of chaos engineering to the next level in a collaborative and interactive, cross-team competition called Game Day. Game Day deliberately created incidents throughout the development and test cycles of various applications, allowing both the platform and application teams to practice coordinated incident response and resolution in real time.
Including both the Technology & Operations and Personal & Commercial Banking teams at RBC, the event allowed for experimentation and learning about how applications and platforms respond to different crisis events. Teams leveraged AI Ops systems to monitor incidents, improved the resiliency of solutions and average recovery time, and addressed any potential vulnerabilities before they became a reality.
“Helping our clients to manage their finances with ease and precision is essential – and we cannot afford to get it wrong,” says Ranji Narine, SVP, Cloud & Transformation. “Game Day was an incredible team effort and resulted in new insights that will change how we develop applications and resolve incidents moving forward.”
RBC is working to bring the Game Day results and framework to the rest of the organization and is identifying new applications to include in the next set of scenarios.
“The goal is to run these scenarios regularly, and add more tests based on learnings from past incidents,” says John Keenleyside, Director, Cloud Engineering & Principal Technologist and Game Day project lead. “Our teams are ready to learn, adapt and create to keep pace with the ever-changing needs of our customers.”
Learn more about our Technology & Operations team and other exciting initiatives we are working on.