Use drills to exercise real emergency scenarios
Emergency events requiring backup control center activation can cause chaos. By adding a tabletop-style discussion of an actual event scenario to your next backup control center drill, you can add the realism of an actual event without the chaos, get more value out of your drills, and better prepare your organization for the unexpected.
Backup control center drills are an important exercise for any transmission system operator with an energy management system (EMS) that operates the bulk power system. Drills are the best way to test the readiness of backup systems, processes, and people. Not only that, they give your organization and staff a greater level of trust in those systems and processes when they are needed. Throughout my career, I have had the opportunity to participate in many backup control center drills. It was not until I experienced an actual event (that required enabling a backup site) that I realized what all those drills were missing…realism. During a drill, teams have pre-scheduled calls and have time to pre-check or prepare the backup site and systems for readiness. Servers are running, processes are running, necessary network paths are open, any required on-site staff is in place, etc. During an actual event, there is no time to check any of that beforehand. There is very little time to gather the required information before making the decision to begin the backup site activation and restore visibility or stability to your critical system.
Realism can be added to your next drill by having tabletop-style discussions about how an actual event requiring backup control center activation would occur. (Realism can also be added by holding unannounced or surprise drills – which I do not recommend. In my opinion and experience, surprise drills with actual activation of a backup site have the potential to add more instability and chaos than they are worth.)
Necessary components of a realistic tabletop exercise
The team: To start, bring together members from all different groups that would be involved in a backup control center activation. This would likely include system operators, EMS support staff, networking staff, and any coordinating staff such as management, incident response, or your security operations center depending on your organization’s incident response structure.
The scenario: Next, discuss with this team what sort of events could occur that would require activation of the backup site and how they might unfold. These events could be physical issues such as a fire, a burst pipe, loss of building power, or incoming severe weather. Or they could be cyber issues, such as a bad software patch, failed network or server infrastructure, or disrupted communications.
The differences: Discussing the differences between a regularly scheduled drill and an actual event is the most valuable part of this entire process. Drills are typically scheduled well in advance, and all necessary parties know when backup site activation is going to happen. An actual event requires you to know the following:
- Who at your organization has the authority to decide when to activate the backup site?
- What information would they need to make that decision?
- How would that information be gathered?
- Who needs to be informed prior to activation?
- Who needs to be informed after activation?
- Are there any time constraints or reporting requirements you might face?
- If the primary site is completely unavailable, does that change your procedure for enabling the backup site?
- Does anyone need to travel to the backup site and how long would that take?
During a drill, these questions do not often come up, but during an actual event they become critical questions to answer.
The challenges: Finally, discuss the challenges your organization might face during an actual event that you would not face during a drill.
- How would an issue first be noticed?
- How would the required people be notified and assembled?
- What if key team members are unavailable?
- What if an event occurs after business hours?
Think of other challenges that your organization might face during an event and discuss what people, processes, and technologies are in place to meet those challenges.
Actual events are a race between triaging what caused the issue and restoring visibility or stability to the system as quickly as possible. At some point, enabling a backup site becomes the best option. By exercising different tabletop scenarios, you can practice your backup control center procedures and become comfortable with your system’s capabilities as part of your next drill – saving valuable time during an actual event.
– Clayton Whitacre, Industrial Systems Engineer, Great River Energy and MRO Security Advisory Council member
ABOUT THE AUTHOR:
Clayton Whitacre is an Industrial Systems Engineer at Great River Energy. Prior to his current role, Whitacre spent roughly a decade designing, supporting, and securing energy management SCADA systems as a Sr. Systems Analyst at Great River Energy and, prior to that, as a Software Engineer at Siemens. Whitacre hold a M.S. in Security Technologies and a B.S. in Computer Science from the University of Minnesota – Twin Cities. He also holds a Certified Information System Security Professional (CISSP) certification from the International Information System Security Certification Consortium (ISC2). Whitacre has been an active member of MRO’s Security Advisory Council since 2020 and is an active participant in various other security-focused industry groups.
MRO is committed to providing non-binding guidance to industry stakeholders on important industry topics. Subject matter experts from MRO’s organizational groups have authored some of the articles in this publication, and the opinion and views expressed in these articles are those of the author(s) and do not necessarily represent the opinions and views of MRO.