Tabletop Testing: An Overview
What is a Tabletop Test?
Tabletop Tests (TTT) are an essential element of programs that we support here at Softrams. The TTT brings together members of the Crisis Management Team to simulate their response to a fictitious emergency scenario. In accordance with the Federal Information Security Act of 2002 (FISMA) and CMS’ Acceptable Risk Safeguards 5.0, the TTT must be completed successfully at least once a year. The ultimate objective of a TTT is to validate the information within the Contingency Plan (CP) and ensure designated personnel understand the information available within the CP. According to NIST 800-57, a CP is a plan that is maintained for disaster response, backup operations, and post-disaster recovery, to confirm the availability of critical resources, and to facilitate the continuity of operations in an emergency. Overall, we can think of the TTT as a response to a fictitious emergency scenario, and we use the CP as a guide for steps that the team can easily follow to recover from an emergency scenario.
What is a Typical Scenario?
Let’s say that an employee was offboarded from an organization, but account credentials were not immediately revoked. A week after leaving the organization, the employee was able to make malicious changes that caused a critical application that we support to become non-functional. Yikes! This sounds like a terrible situation, but luckily, we do have a superior CP in place which provides detailed steps that the organization follows to restore the system. In fact, restoring the system should be very organized because we conduct a TTT at least annually. Although this is just a typical scenario, we must make sure that we are prepared for all scenarios which may impact our ability to keep our applications from functioning properly. In other words, expect the unexpected!
Measures of Success
In a real emergency scenario, it is imperative that the emergency is remediated according to specific parameters. For example, if a Recovery Time Objective (RTO) is 24 hours, this means that the targeted duration of time between the event of failure and the point where operations resume must be 24 hours or less. Also, the team must meet the Recovery Point Objective (RPO), which is the maximum length of time permitted that data can be restored from, which may or may not mean data loss.
Next, we have the Maximum Tolerable Downtime (MTD), which can be described as the total amount of time that a system can remain disrupted before the affected organization’s existence or operational proficiency is at risk. In addition, MTD is also the sum of the WRT and the RTO, and in most scenarios, this would be 48 hours. Lastly, the Work Recovery Time (WRT) establishes the maximum tolerable amount of time it takes to verify systems and ensure that data is protected. For example, during a real emergency, logs, applications, databases, and all services will undergo a formal testing process to determine whether they are functioning correctly or not.
TTT Walkthrough
During a typical TTT test, members of the team are presented with sequential steps that should be followed in the event of a real disaster. For example, if the first step is that the actual event occurs, then members of the team are to provide their response as quickly and efficiently as possible. In the fictitious case of a former employee making malicious changes to an application, members of the leadership team must be contacted to be made aware of the incident. It is also during this step that the accuracy of a call tree is tested because everyone within the call tree has a responsibility regarding communicating with all stakeholders. A call tree is essentially a communication model that is used to let specific individuals know about the event and organize a recovery effort if necessary.
After all stakeholders are notified, the steps to recover the application are followed, which are outlined in the Contingency Plan. Once the necessary steps are carried out for the application to function, testing and validation must be completed to ensure successful restoration of an application. When testing is completed, and leadership approves of the successful testing of the application, a formal decision will be made to declare that recovery efforts are complete.
Wrap-Up
The idea of a Tabletop Test is nothing new if we look at what history has taught us. Sun Tzu, one of the most important figures in the history of war once stated, “The general who wins the battle makes many calculations in his temple before the battle is fought. The general who loses makes but few calculations beforehand.” Consequently, Softrams must be very calculated and prepared to thwart potential cybersecurity related attacks that may disrupt the organizations that we support.