While zero trust guidance for enterprise information technology (EIT) systems is well established, its direct application to operational technology (OT) environments is problematic due to fundamental differences in system architecture and operational priorities. Zero trust frameworks tailored to the unique requirements of OT systems are just beginning to emerge. The Software Engineering Institute (SEI) is pioneering research into the application of zero trust principles within weapon system environments with embedded OT. In this blog post, we explore a specific case study and examine how findings from our research on weapon systems driven by embedded OT translate to the broader OT landscape.
Zero trust is an evolving set of cybersecurity paradigms that move defenses from static, network-based perimeters to a focus on users, assets, resources, and flows within an enclave. Zero trust assumes there is no implicit trust granted to assets or user accounts based solely on their physical or network location.
In our research, we identified opportunities for zero trust integration in weapons systems OT by analyzing how the core concepts of foundational security principles—originally developed for EIT—can fit the unique OT landscape. The initiative stems from a recognized need among Department of War (DoW) stakeholders for guidance in this area.
The preliminary phase of our work involved a comprehensive examination of foundational security paradigms and zero trust principles to determine their applicability to the unique requirements of weapon systems. The findings of this work were published in the paper Tailoring Security and Zero Trust Principles to Weapons System Environments.
Utilizing the insights from the DoW’s recently published guidance Zero Trust for Operational Technology, we are continuing to tailor and adapt zero trust concepts to address OT concerns in weapon systems. Weapon systems can be considered a specific application of OT, and as such, our findings will offer valuable insights to help advance the implementation of cybersecurity in a zero trust framework across the broader OT domain. Weapon systems, like other OT domains, must meet stringent real-time performance requirements that can’t be met with standard, IT-focused principles. We use our weapon systems analysis to help define the practical boundaries needed to protect complex OT environments.
Securing the Grid: The Commerce Energy Case Study
To illustrate our points in this blog post, we use a case study focused on the digital substations of Commerce Energy, a fictional utility firm. A substation is a part of the broader generation, transmission, and distribution system that has the function of stepping down high-voltage levels from the transmission system (bulk power) to feed more local distribution circuits in response to the dynamic demands of homes and small businesses. A typical substation governs the protection, monitoring, and automation of all transformers and breakers directly involved in transporting bulk electricity.
Commerce Energy’s automatic control systems manage subsystem data and communicate with intelligent electronic devices (IEDs), relays, and other equipment. A web-based human-machine interface (HMI) is used to support human operators for local and remote monitoring, control, and annunciation for substations and other processes. The Supervisory Control and Data Acquisition (SCADA) system provides high-level views for monitoring overall grid stability and power flow and managing switching operations in substations.
Controls for Commerce Energy’s substations are organized into distinct levels following the Purdue model, which enables Commerce Energy’s substation communications to be structurally compartmentalized. Commerce Energy relies on these isolated enclaves at each level, where traffic is restricted through segmentation and access controls. While these controls have been effective to date, in our scenario the rising risks to critical infrastructure are prompting new concerns: lateral movement, the integrity of signals being sent to control devices, the actual security posture of their remote connections, and compromised devices they may already have in the system. There are also concerns about potential “blind spots” within their older equipment. Seeking to reinforce its defenses, Commerce Energy is considering a zero trust initiative, starting with a threat analysis.
Figure 1: Commerce Energy OT Network Architecture
Critical Concerns in Securing Operational Technology
Critical infrastructure, more generally, is battling a full, evolving range of cyber and physical dangers, from systemic weaknesses to sophisticated nation-state sabotage. The dangers include intentional threats (hacktivists, organized crime), insider threats, and accidental, negligent, or natural hazards. To help make informed decisions for zero trust defenses, the Cloud Security Alliance (CSA) recently published guidelines for applying zero trust principles within unique operational technology (OT) systems. The CSA guidance highlights the main drivers behind malicious interest in OT:
- Regulatory and Compliance Pressure that may not align with effective cybersecurity practices
- Insider Threats, whether acting maliciously or through negligence
- Supply Chain Vulnerabilities, which can introduce malicious elements into systems,
- High Impact destruction and damage
- Interconnected and Interdependent Systems where a breach in one area can cascade into others
- Economic Motivations where attackers seek monetary gain
- Cyber Espionage where intelligence on a country’s net power is gathered
- Political Motivations to destabilize a nation or place demands on governments
- Easy Targets such as legacy technologies
- Nation-State Cyber Warfare to gain a strategic advantage without use of traditional military means
- Physical Security that may be exposed, often under-guarded
Commerce Energy integrated the threats listed in the CSA guidance with their own specialized findings to broaden their security profile. Commerce Energy primarily aligns with three of CSA’s listed threat categories: insider threats, supply chain vulnerabilities, and nation-state actors. For Commerce Energy, ransomware represents a rapidly escalating, high-impact threat, further compounded by critical vulnerabilities within their aging, legacy software and hardware infrastructure. After analyzing their specific OT threat landscape, they pinpointed five unique areas of concern:
- Advanced persistent threats (APTs). Advanced persistent threats are primarily considered to be nation-state actors or state-sponsored groups, or actors with some degree of sponsorship from these groups. Attacks by APTs are sophisticated, highly targeted, and designed to infiltrate OT systems with the goal of disrupting operations, sabotage, or stealing sensitive data. Once successful, they often cause significant political and economic losses, including complete destruction of the target system. These threats are persistent, meaning the attackers quietly maintain undetected access and presence in a network for a long time to study the target system and identify high-value assets and vulnerabilities. APT attacks are one of the most destructive security threats to digital substations. Attack methods are complex and difficult to detect with traditional attack detection technologies (e.g., traditional firewalls, intrusion detection systems, and intrusion prevention systems). Recent advances in AI have created the possibility that APT-level threats can expand and accelerate.
- Ransomware attacks. The recent increase in ransomware attacks has provided impetus for implementing zero trust as part of modern cybersecurity strategy. Predominantly motivated by money, ransomware operators typically encrypt files and demand payment for a decryption tool to recover the data held hostage. Paying the ransom does not always guarantee that the victim can regain access to their data (but ransomware operators do have an incentive to decrypt, since that enhances the credibility of their ransom demands). Similar to software as a service (SaaS), ransomware-as-a-service is a business model that makes ransomware available for use by non-computer-savvy persons. Attackers have begun to focus on larger enterprises and critical infrastructure for larger payouts. Ransomware can disrupt operational technology by manipulating or damaging physical equipment such as sensors, actuators, pumps, and other equipment.
- Insider threat. Security breaches don’t always involve external actors. Insider threat involves any individual who has authorized access to a system, its data, or its interdependent platforms and components. There is a tendency to think of malicious insiders or disgruntled employees, but that’s not always the case. A well-intentioned individual can be forgetful, complacent, or susceptible to psychological exploitation by attackers. These inadvertent actions can have far-reaching consequences, causing disruptions across an entire network. Employees may inadvertently create security weaknesses by connecting vulnerable or compromised devices.
Psychological exploitation continues to succeed because, unlike technical vulnerabilities, it exploits ingrained human behaviors, social patterns, and cognitive biases. Social engineering campaigns can target employees on a large scale, but with AI can also be customized to individuals. They are designed to take advantage of unsuspecting employees who might inadvertently introduce malware to compromise systems and data. Uninformed operators can unknowingly introduce ransomware into an industrial control system (ICS), for example by plugging infected USB drives into control system workstations. Simulated phishing tests show that employees at Commerce Energy are highly susceptible, with many users failing to thwart phishing attempts. Commerce Energy identifies personnel behavior—likely due to insufficient training—as their primary vulnerability, with inattentive adherence to USB protocols.
- Legacy systems. Many OT systems still rely on components and software that were not developed to withstand the current threat landscape and are therefore easily exploited by modern attack methods. The term legacy systems is used to describe outdated or antiquated technology that is still in use and might not have had recent updates. This can include server and workstation operating systems, outdated programming languages, and insecure designs. For the critical infrastructure domain, “legacy” is based on technology reference points. Legacy can mean purely electromechanical equipment, such as mechanical relay coil and contacts, or analog equipment with copper wiring between switchyard equipment and control rooms. Microprocessor-based relays and processor-based technology (e.g., IEDs) replaced legacy coil and contacts and analog equipment. Many of these early-generation microprocessor-based devices now represent a weak link for today’s modern cybersecurity requirements, often because they were designed to operate within secure “air gapped” enclaves. For example, legacy IEDs may have unencrypted firmware and use serial communication and proprietary protocols that lack basic authentication and integrity checks.
Commerce Energy maintains critical workloads on a mix of modern and legacy infrastructure. Some of Commerce Energy’s substations still rely on some of these older devices that have legacy firmware and do not use standardized communication protocols for data exchange. Replacing all the equipment will require too much change to their infrastructure and is not a current priority based on cost and reliability. A complete rebuild will require keeping each substation in service while the new infrastructure is being built, re-running all cables, one circuit at a time, until all circuits are being fed from the new substation.
- Supply chain. The complex supply chain has become a challenge in responding to vulnerabilities in software. Every product is composed of yet another set of components that were externally sourced to build that product. Components within components can be nested several layers deep, making it hard to reach full visibility into all components that make up a product. Modern websites, for example, may include hundreds of separately developed components. Managed service arrangements associated with cloud-based products (software-, infrastructure-, and platform-as-a-service) create an even broader supply chain, expanding the attack surface and giving threat actors another means of compromise by leveraging a third party. The global supply chain adds serious risks for both IT and OT systems. Challenges include counterfeit hardware, unauthorized modifications, and embedded malicious components from original equipment manufacturers (OEMs). Challenges can also include components or services that have not had recent updates that address new kinds of threats and vulnerabilities. Another type of supply chain vulnerability faced by Commerce Energy is “last-mile” logistics, specifically regarding equipment deliveries such as protective relays, controllers, and other equipment from vendors. There is a visibility gap once these relays leave the vendor, introducing an in-transit tampering risk where the "trust gap" in the delivery process is exploited.
From Blind Spots to Blueprints
As the final stage of their threat analysis, Commerce Energy mapped out every identified entry point into their infrastructure. The mapping identified potential points of compromise existing across all levels of interconnected OT assets and the supply chain. Cyber threats to their substations, which they had always considered isolated, can arrive through vendors, firmware updates, workstations, and networked devices already inside the perimeter. While the Purdue representation provides a foundational blueprint for segmenting their systems, counting only on isolation and access controls at each level is no longer sufficient.
Figure 2: Commerce Energy Threat Attack Surface
Mission Focused Approach to Applying Zero Trust Strategy
In 2022, The President's National Security Telecommunications Advisory Committee (NSTAC) outlined a five-step, systematic approach for securing OT and ICS:
- Define the Protect Surface - identifying Data, Applications, Assets, and Services (DAAS) elements to protect
- Map the Transaction Flows - mapping the transaction flows to and from the protect surface
- Build a Zero Trust Architecture - designing the zero trust architecture to support the DAAS elements and transaction flows
- Create a Zero Trust Policy – identifying person and non-person entities for access
- Monitor and Maintain the Network – inspecting and logging all traffic
The SEI is emphasizing a mission-focus approach to OT cybersecurity, where the appropriate zero trust technology is incorporated into the entire system lifecycle to achieve the objectives of that unique OT system’s mission. Complementary to steps 1 and 2, a mission-focused approach provides the essential context for Step 3.
Building a zero trust architecture requires a comprehensive understanding of the system's operational landscape. What is its intended purpose or objective? Are there different modes of operation? What are the distinct operational scenarios for the system? Who are the operators or end-users of the system? What conditions influence the system’s behavior at any point in time? Are there dependencies on external environments for things like maintenance or support? What are the system’s unique challenges or limitations? What threat actors or techniques are systems most exposed to? A mission-focused approach involves analyzing a system and integrating that mission information to form the specific technical requirements needed to build a zero trust architecture. In the next section, we apply the SEI’s mission-focused methodology for making informed decisions about zero trust implementation to the Commerce Energy case.
Gaining Visibility into the Unique OT Environment
Security principles, including zero trust principles, are best understood when viewed from the perspective of the operating environments where they are to be applied. As outlined in our paper, the SEI is sharpening its focus on five key factors of an OT environment, identified by the DoW, that are important to understand prior to examining security and zero trust frameworks: mission context, system attributes, threat environment, tradeoff space, and mission dependencies. By understanding an OT system’s environment, security deployments will align with a system’s unique contextual factors, thereby enhancing the system’s ability to achieve its mission securely.
Mission Context
Analysis of mission context is intended to provide a clear understanding of the purpose, goals, and operational environment in which a system is designed, developed, deployed, operated, and maintained. Understanding mission context is done through mission threads, activities, and processes that define the mission, detailing the critical capabilities and interactions required to achieve mission success. DAAS act as the foundational components and enablers of mission threads, directly supporting the activities and processes that define a mission.
The substations’ primary mission is to safely transform, regulate, and distribute electricity between generation sources and end users. Scenarios would describe regulation of voltage, the directing of load distribution, and provision of fault protection. Mission context provides a way for stakeholders to understand the consequences of security threats and attacks.
System Attributes
Zero trust guidance for EIT is often unsuitable for operational technology environments because of significant differences in architecture, the diverse and specialized nature of OT components, equipment age, process criticality, the requirement for continuous availability, and legacy systems. The DoW has identified five system-specific attributes that can help to evaluate a system’s ability to accommodate zero trust capabilities:
- Dynamic configurability. Continuous monitoring and dynamic policy enforcement require near real-time reconfigurability. The system must have adequate flexibility to configure system-level changes relating to governance, trust relationships, workflows, and access policies to implement zero trust capabilities in near real-time. In our substation example, if a system operator logs into an HMI, perhaps a policy engine would perform an algorithmic evaluation of a number of risk factors, such as the workstation’s current security patch levels, completed anti-malware scan status, MAC address validation, security certificate validation, and/or access authorization to the specific network subnet. Furthermore, this access decision is continually re-evaluated over time. The amount of dynamic configurability depends on the risk reduction impact from these specific safeguards.
- Design/retrofit flexibility. Implementing zero trust might necessitate new technologies or innovations, which may require an architectural revamp or retrofit of legacy systems. The system must have adequate flexibility to enable changes to engineering design or retrofits to an existing system to implement zero trust capabilities. Commerce Energy’s substation network is a hybrid environment with a modern SCADA system and a legacy electrical substation monitoring system that is used to monitor several parameters of approximately 100 secondary substations. Each secondary substation relies on outdated, proprietary protocols that cannot be integrated into the modern central monitoring system. This makes it difficult to continuously track the health and status of these electrical assets.
- Size, weight, and power (SWaP). Size, weight, and power constraints can create immutable boundaries that thwart modification of engineering designs or changes to operational systems to implement zero trust capabilities. Commerce energy would like to implement more granular controls to ensure that even if a Purdue model level 2 PLC or IED is compromised, it cannot interact with a Purdue model level 1 controller without successfully passing real-time authorization and identity checks. Commerce Energy’s secondary substations, on the other hand, have ICS devices (IEDs, PLCs, and sensors) that run on protocols that lack the capability of granular access controls, have no identity management, and must instead rely on external mechanisms for zero trust enforcement.
- Latency tolerance. Persistent access management and other zero trust implementations may add latency, creating bottlenecks in systems that cannot tolerate delay. Systems must have the ability to absorb any delay introduced by zero trust capabilities and still meet system performance requirements. Consider malware detection, which may involve real-time scanning and automatic updates to help protect against online threats like phishing and malicious websites. Commerce Energy must determine whether antivirus software will interfere with the real-time operations and critical processes that are required by their automation system network. Many legacy systems are implemented without sufficient “headroom” to enable upgrades such as for zero-trust.
- IT/OT centricity. An analysis of IT/OT-centricity focuses on finding OT components that are IT-like, increasing the probability that you can carry over IT security principles. This analysis highlights obstacles to implementing any meaningful zero trust capabilities. Depending on the attribute profile, an OT system may be suitable for implementing only certain zero trust capabilities and not the others because of specific system constraints. These system attributes, in addition to operational and programmatic considerations, will drive the cost-benefit analysis of zero trust approaches.
Commerce Energy has a mix of IT-centric support and control systems and OT-centric devices and controllers. The HMIs are built on an IT-centric Windows platform that allows for on-device deployment of zero trust controls through granular access management via built-in capabilities. Their OT-centric devices and controllers that are older have low processing power and memory, have limited computational capabilities, and run on proprietary protocols.
Threat environment
The threat environment includes the full range of potential threats (internal and external) that can lead to adverse mission impacts and the context in which these threats operate. The goal is to design security controls that are customized to the threat landscape targeting the specific system.
For Commerce Energy, the attack surface extends across critical components, including SCADA systems, communication gateways, IEDs, and HMIs. The threat surface can expand as information is shared more broadly as in third-party access to data or systems.
Tradeoff space
A tradeoff space refers to the range of possible solutions or design choices that must be analyzed to strike a balance among competing requirements or objectives. The systematic analysis of competing requirements (i.e., requirements of the operational system and required resources for the proposed solution) helps to determine where new deployments in one area might produce risks or problems in another.
The tradeoff space emerges from the combined influence of the mission context, system attributes, and threat environment, which fundamentally inform key decisions. Over time, these factors need to be periodically readdressed. For example, changes in technology, funding, or available resources may change the tradeoff space. Optimal effectiveness and resilience are achieved by carefully aligning and prioritizing the implementation of solutions based on the tradeoff space.
Mission dependencies
Systems often exist within a larger context as they interact with other systems as part of a broader ecosystem. Commerce Energy’s substations depend on an Outage Management System (OMS) that works in conjunction with the SCADA system to detect, analyze, and report outages in real-time. Other substation dependencies may include geographic information systems, advanced metering systems, and weather forecasting systems. It is important to understand a system’s boundaries and how it must interact with other systems to assess and manage dependency risk.
The Roadmap to Resilience – Strategic Control Selection for ICS
Commerce Energy is on their way to reducing their attack surface and increasing visibility into their security environment in a phased modernization centered on a zero trust architecture. They already had some controls in place that qualify as components of zero trust. After auditing their assets, they took the following actions:
- secured high-risk assets (design stations, operator workstations, historians) with on-device zero trust controls enabling precise, granular access management.
- imposed logical boundaries and strict access controls between devices at the same level to block lateral movement
- implemented stringent multi-factor authentication (MFA) and are now enforcing secure, centralized management of third-party remote connections. When an operator attempts to authenticate into their SCADA client, zero trust policies are evaluated against the policy engine and the security risk state is evaluated.
- retrofitted their legacy infrastructure into their modern system via an intermediary layer, which offered a standardized interface for interacting with multiple devices and protocols, allowing for interoperability across sensor networks. This approach will provide temporary bridging functionality until modern digital signaling is deployed in the secondary substations and integrated with the zero trust architecture.
Commerce Energy feels that the changes have manageable administrative overload and technical complexity that falls within acceptable operational risk tolerances. These security improvements are part of an incremental zero trust maturity roadmap, which is far superior to taking no action.
Looking Ahead: Sustaining Resilience Through Mission-Focused Defense
The cyber threat landscape for OT is constantly evolving. The dynamic nature of the cyber threats targeting OT necessitates a strategy of continuous focus, reassessment, and adaptation. In mixed-capability environments like Commerce Energy, there is no one-size-fits-all approach that can implement zero trust across an organization’s entire OT/ICS environment. Rather, the components of zero trust need to be separated and applied where they are capable of being deployed. The ability and extent to which zero trust components can be deployed must be assessed on a site, facility, and subsystem basis. Zero trust should be part of the design and planning phases moving forward.
Effective OT security requires analyzing all potential threats and the context in which they operate and then making risk-based decisions. A mission-focused zero trust strategy prompts organizations to continually reassess cyber threats, establish defense priorities based on the greatest risks, and make informed decisions on security implementation investments. Understanding the operational environment from a mission perspective enables informed and effective design choices—those design choices are based on systematic analysis of tradeoffs between essential cybersecurity protections and functional interoperability requirements. The objective is to optimize security alongside performance and interoperability requirements while also managing budgetary and schedule constraints.
Effective security requires a focused strategy. Security deployments can be costly, adding to the complexity of an OT environment and possibly affecting the system’s behaviors and effects, including safety, availability, and reliability. Each organization must determine its risk profile—its tolerance—to potential OT cybersecurity threats in its production environments and prioritize the implementation of solutions that best mitigate those threats. There will be design choices to make based on a systematic analysis of the tradeoffs among the system’s requirements and objectives.
Keep in mind that recommendations from a mission-focused assessment do not need to be deployed all at once. For OT/ICS environments, implementing zero trust is an evolutionary process that requires coordination between multiple business units and disciplines. A phased and strategic implementation is more effective and sustainable in the long run. Having contextual awareness of the system enables one to identify immediate capabilities and anticipate and plan for future potential challenges. Because of this, it will likely take years with careful planning and full support from all operational areas and leadership to implement zero trust in stages across an organization’s entire OT/ICS environment. However, some organizations may find that legacy systems and facilities may not be feasibly updateable to zero trust. These entities will need to account for any residual risks from such facilities if they deem zero trust controls are necessary for risk mitigation.