User Management Archives - ZPE Systems

Edge Computing Use Cases in Banking

ZPE Systems — Tue, 13 Aug 2024 17:35:33 +0000

The banking and financial services industry deals with enormous, highly sensitive datasets collected from remote sites like branches, ATMs, and mobile applications. Efficiently leveraging this data while avoiding regulatory, security, and reliability issues is extremely challenging when the hardware and software resources used to analyze that data reside in the cloud or a centralized data center.

Edge computing decentralizes computing resources and distributes them at the network’s “edges,” where most banking operations take place. Running applications and leveraging data at the edge enables real-time analysis and insights, mitigates many security and compliance concerns, and ensures that systems remain operational even if Internet access is disrupted. This blog describes four edge computing use cases in banking, lists the benefits of edge computing for the financial services industry, and provides advice for ensuring the resilience, scalability, and efficiency of edge computing deployments.

4 Edge computing use cases in banking

1. AI-powered video surveillance

PCI DSS requires banks to monitor key locations with video surveillance, review and correlate surveillance data on a regular basis, and retain videos for at least 90 days. Constantly monitoring video surveillance feeds from bank branches and ATMs with maximum vigilance is nearly impossible for humans, but machines excel at it. Financial institutions are beginning to adopt artificial intelligence solutions that can analyze video feeds and detect suspicious activity with far greater vigilance and accuracy than human security personnel.

When these AI-powered surveillance solutions are deployed at the edge, they can analyze video feeds in real time, potentially catching a crime as it occurs. Edge computing also keeps surveillance data on-site, reducing bandwidth costs and network latency while mitigating the security and compliance risks involved with storing videos in the cloud.

2. Branch customer insights

Banks collect a lot of customer data from branches, web and mobile apps, and self-service ATMs. Feeding this data into AI/ML-powered data analytics software can provide insights into how to improve the customer experience and generate more revenue. By running analytics at the edge rather than from the cloud or centralized data center, banks can get these insights in real-time, allowing them to improve customer interactions while they’re happening.

For example, edge-AI/ML software can help banks provide fast, personalized investment advice on the spot by analyzing a customer’s financial history, risk preferences, and retirement goals and recommending the best options. It can also use video surveillance data to analyze traffic patterns in real-time and ensure tellers are in the right places during peak hours to reduce wait times.

3. On-site data processing

Because the financial services industry is so highly regulated, banks must follow strict security and privacy protocols to protect consumer data from malicious third parties. Transmitting sensitive financial data to the cloud or data center for processing increases the risk of interception and makes it more challenging to meet compliance requirements for data access logging and security controls.

Edge computing allows financial institutions to leverage more data on-site, within the network security perimeter. For example, loan applications contain a lot of sensitive and personally identifiable information (PII). Processing these applications on-site significantly reduces the risk of third-party interception and allows banks to maintain strict control over who accesses data and why, which is more difficult in cloud and colocation data center environments.

4. Enhanced AIOps capabilities

Financial institutions use AIOps (artificial intelligence for IT operations) to analyze monitoring data from IT devices, network infrastructure, and security solutions and get automated incident management, root-cause analysis (RCA), and simple issue remediation. Deploying AIOps at the edge provides real-time issue detection and response, significantly shortening the duration of outages and other technology disruptions. It also ensures continuous operation even if an ISP outage or network failure cuts a branch off from the cloud or data center, further helping to reduce disruptions and remote sites.

Additionally, AIOps and other artificial intelligence technology tend to use GPUs (graphics processing units), which are more expensive than CPUs (central processing units), especially in the cloud. Deploying AIOps on small, decentralized, multi-functional edge computing devices can help reduce costs without sacrificing functionality. For example, deploying an array of Nvidia A100 GPUs to handle AIOps workloads costs at least $10k per unit; comparable AWS GPU instances can cost between $2 and $3 per unit per hour. By comparison, a Nodegrid Gate SR costs under $5k and also includes remote serial console management, OOB, cellular failover, gateway routing, and much more.

The benefits of edge computing for banking

Edge computing can help the financial services industry:

Reduce losses, theft, and crime by leveraging artificial intelligence to analyze real-time video surveillance data.
Increase branch productivity and revenue with real-time insights from security systems, customer experience data, and network infrastructure.
Simplify regulatory compliance by keeping sensitive customer and financial data on-site within company-owned infrastructure.
Improve resilience with real-time AIOps capabilities like automated incident remediation that continues operating even if the site is cut off from the WAN or Internet
Reduce the operating costs of AI and machine learning applications by deploying them on small, multi-function edge computing devices.
Mitigate the risk of interception by leveraging financial and IT data on the local network and distributing the attack surface.

Edge computing best practices

Isolating the management interfaces used to control network infrastructure is the best practice for ensuring the security, resilience, and efficiency of edge computing deployments. CISA and PCI DSS 4.0 recommend implementing isolated management infrastructure (IMI) because it prevents compromised accounts, ransomware, and other threats from laterally moving from production resources to the control plane.

Using vendor-neutral platforms to host, connect, and secure edge applications and workloads is the best practice for ensuring the scalability and flexibility of financial edge architectures. Moving away from dedicated device stacks and taking a “platformization” approach allows financial institutions to easily deploy, update, and swap out applications and capabilities on demand. Vendor-neutral platforms help reduce hardware overhead costs to deploy new branches and allow banks to explore different edge software capabilities without costly hardware upgrades.

Additionally, using a centralized, cloud-based edge management and orchestration (EMO) platform is the best practice for ensuring remote teams have holistic oversight of the distributed edge computing architecture. This platform should be vendor-agnostic to ensure complete coverage over mixed and legacy architectures, and it should use out-of-band (OOB) management to provide continuous remote access to edge infrastructure even during a major service outage.

How Nodegrid streamlines edge computing for the banking industry

Nodegrid is a vendor-neutral edge networking platform that consolidates an entire edge tech stack into a single, cost-effective device. Nodegrid has a Linux-based OS that supports third-party VMs and Docker containers, allowing banks to run edge computing workloads, data analytics software, automation, security, and more.

The Nodegrid Gate SR is available with an Nvidia Jetson Nano card that’s optimized for artificial intelligence workloads. This allows banks to run AI surveillance software, ML-powered recommendation engines, and AIOps at the edge alongside networking and infrastructure workloads rather than purchasing expensive, dedicated GPU resources. Plus, Nodegrid’s Gen 3 OOB management ensures continuous remote access and IMI for improved branch resilience.

Get Nodegrid for your edge computing use cases in banking

Nodegrid’s flexible, vendor-neutral platform adapts to any use case and deployment environment. Watch a demo to see Nodegrid’s financial network solutions in action.

Watch a demo

The post Edge Computing Use Cases in Banking appeared first on ZPE Systems.

AI Orchestration: Solving Challenges to Improve AI Value

ZPE Systems — Fri, 02 Aug 2024 20:53:45 +0000

Generative AI and other artificial intelligence technologies are still surging in popularity across every industry, with the recent McKinsey global survey finding that 72% of organizations had adopted AI in at least one business function. In the rush to capitalize on the potential productivity and financial gains promised by AI solution providers, technology leaders are facing new challenges relating to deploying, supporting, securing, and scaling AI workloads and infrastructure. These challenges are exacerbated by the fragmented nature of many enterprise IT environments, with administrators overseeing many disparate, vendor-specific solutions that interoperate poorly if at all.

The goal of AI orchestration is to provide a single, unified platform for teams to oversee and manage AI-related workflows across the entire organization. This post describes the ideal AI orchestration solution and the technologies that make it work, helping companies use artificial intelligence more efficiently.

AI challenges to overcome

The challenges an organization must overcome to use AI more cost-effectively and see faster returns can be broken down into three categories:

Overseeing AI-led workflows to ensure models are behaving as expected and providing accurate results, when these workflows are spread across the enterprise in different geographic locations and vendor-specific applications.
.
Efficiently provisioning, maintaining, and scaling the vast infrastructure and computational resources required to run intensive AI workflows at remote data centers and edge computing sites.
.
Maintaining 24/7 availability and performance of remote AI workflows and infrastructure during security breaches, equipment failures, network outages, and natural disasters.

These challenges have a few common causes. One is that artificial intelligence and the underlying infrastructure that supports it are highly complex, making it difficult for human engineers to keep up. Two is that many IT environments are highly fragmented due to closed vendor solutions that integrate poorly and require administrators to manage too many disparate systems, allowing coverage gaps to form. Three is that many AI-related workloads occur off-site at data centers and edge computing sites, so it’s harder for IT teams to repair and recover AI systems that go down due to a networking outage, equipment failure, or other disruptive event.

How AI orchestration streamlines AI/ML in an enterprise environment

The ideal AI orchestration platform solves these problems by automating repetitive and data-heavy tasks, unifying workflows with a vendor-neutral platform, and using out-of-band (OOB) serial console management to provide continuous remote access even during major outages.

Automation

Automation is crucial for teams to keep up with the pace and scale of artificial intelligence. Organizations use automation to provision and install AI data center infrastructure, manage storage for AI training and inference data, monitor inputs and outputs for toxicity, perform root-cause analyses when systems fail, and much more. However, tracking and troubleshooting so many automated workflows can get very complicated, creating more work for administrators rather than making them more productive. An AI orchestration platform should provide a centralized interface for teams to deploy and oversee automated workflows across applications, infrastructure, and business sites.

Unification

The best way to improve AI operational efficiency is to integrate all of the complicated monitoring, management, automation, security, and remediation workflows. This can be accomplished by choosing solutions and vendors that interoperate or, even better, are completely vendor-agnostic (a.k.a., vendor-neutral). For example, using open, common platforms to run AI workloads, manage AI infrastructure, and host AI-related security software can help bring everything together where administrators have easy access. An AI orchestration platform should be vendor-neutral to facilitate workload unification and streamline integrations.

Resilience

AI models, workloads, and infrastructure are highly complex and interconnected, so an issue with one component could compromise interdependencies in ways that are difficult to predict and troubleshoot. AI systems are also attractive targets for cybercriminals due to their vast, valuable data sets and because of how difficult they are to secure, with HiddenLayer’s 2024 AI Threat Landscape Report finding that 77% of businesses have experienced AI-related breaches in the last year. An AI orchestration platform should help improve resilience, or the ability to continue operating during adverse events like tech failures, breaches, and natural disasters.

Gen 3 out-of-band management technology is a crucial component of AI and network resilience. A vendor-neutral OOB solution like the Nodegrid Serial Console Plus (NSCP) uses alternative network connections to provide continuous management access to remote data center, branch, and edge infrastructure even when the ISP, WAN, or LAN connection goes down. This gives administrators a lifeline to troubleshoot and recover AI infrastructure without costly and time-consuming site visits. The NSCP allows teams to remotely monitor power consumption and cooling for AI infrastructure. It also provides 5G/4G LTE cellular failover so organizations can continue delivering critical services while the production network is repaired.

Gen 3 OOB also helps organizations implement isolated management infrastructure (IMI), a.k.a, control plane/data plane separation. This is a cybersecurity best practice recommended by the CISA as well as regulations like PCI DSS 4.0, DORA, NIS2, and the CER Directive. IMI prevents malicious actors from being able to laterally move from a compromised production system to the management interfaces used to control AI systems and other infrastructure. It also provides a safe recovery environment where teams can rebuild and restore systems during a ransomware attack or other breach without risking reinfection.

Getting the most out of your AI investment

An AI orchestration platform should streamline workflows with automation, provide a unified platform to oversee and control AI-related applications and systems for maximum efficiency and coverage, and use Gen 3 OOB to improve resilience and minimize disruptions. Reducing management complexity, risk, and repair costs can help companies see greater productivity and financial returns from their AI investments.

The vendor-neutral Nodegrid platform from ZPE Systems provides highly scalable Gen 3 OOB management for up to 96 devices with a single, 1RU serial console. The open Nodegrid OS also supports VMs and Docker containers for third-party applications, so you can run AI, automation, security, and management workflows all from the same device for ultimate operational efficiency.

Streamline AI orchestration with Nodegrid

Contact ZPE Systems today to learn more about using a Nodegrid serial console as the foundation for your AI orchestration platform. Contact Us

The post AI Orchestration: Solving Challenges to Improve AI Value appeared first on ZPE Systems.

Edge Computing Use Cases in Telecom

ZPE Systems — Wed, 31 Jul 2024 17:15:04 +0000

Telecommunications networks are vast and extremely distributed, with critical network infrastructure deployed at core sites like Internet exchanges and data centers, business and residential customer premises, and access sites like towers, street cabinets, and cell site shelters. This distributed nature lends itself well to edge computing, which involves deploying computing resources like CPUs and storage to the edges of the network where the most valuable telecom data is generated. Edge computing allows telecom companies to leverage data from CPE, networking devices, and users themselves in real-time, creating many opportunities to improve service delivery, operational efficiency, and resilience.

This blog describes four edge computing use cases in telecom before describing the benefits and best practices for edge computing in the telecommunications industry.

4 Edge computing use cases in telecom

1. Enhancing the customer experience with real-time analytics

Each customer interaction, from sales calls to repair requests and service complaints, is a chance to collect and leverage data to improve the experience in the future. Transferring that data from customer sites, regional branches, and customer service centers to a centralized data analysis application takes time, creates network latency, and can make it more difficult to get localized and context-specific insights. Edge computing allows telecom companies to analyze valuable customer experience data, such as network speed, uptime (or downtime) count, and number of support contacts in real-time, providing better opportunities to identify and correct issues before they go on to affect future interactions.

2. Streamlining remote infrastructure management and recovery with AIOps

AIOps helps telecom companies manage complex, distributed network infrastructure more efficiently. AIOps (artificial intelligence for IT operations) uses advanced machine learning algorithms to analyze infrastructure monitoring data and provide maintenance recommendations, automated incident management, and simple issue remediation. Deploying AIOps on edge computing devices at each telecom site enables real-time analysis, detection, and response, helping to reduce the duration of service disruptions. For example, AIOps can perform automated root-cause analysis (RCA) to help identify the source of a regional outage before technicians arrive on-site, allowing them to dive right into the repair. Edge AIOps solutions can also continue functioning even if the site is cut off from the WAN or Internet, potentially self-healing downed networks without the need to deploy repair techs on-site.

3. Preventing environmental conditions from damaging remote equipment

Telecommunications equipment is often deployed in less-than-ideal operating conditions, such as unventilated closets and remote cell site shelters. Heat, humidity, and air particulates can shorten the lifespan of critical equipment or cause expensive service failures, which is why it’s recommended to use environmental monitoring sensors to detect and alert remote technicians to problems. Edge computing applications can analyze environmental monitoring data in real-time and send alerts to nearby personnel much faster than cloud- or data center-based solutions, ensuring major fluctuations are corrected before they damage critical equipment.

4. Improving operational efficiency with network virtualization and consolidation

Another way to reduce management complexity – as well as overhead and operating expenses – is through virtualization and consolidation. Network functions virtualization (NFV) virtualizes networking equipment like load balancers, firewalls, routers, and WAN gateways, turning them into software that can be deployed anywhere – including edge computing devices. This significantly reduces the physical tech stack at each site, consolidating once-complicated network infrastructure into, in some cases, a single device. For example, the Nodegrid Gate SR provides a vendor-neutral edge computing platform that supports third-party NFVs while also including critical edge networking functionality like out-of-band (OOB) serial console management and 5G/4G cellular failover.

Edge computing in telecom: Benefits and best practices

Edge computing can help telecommunications companies:

Get actionable insights that can be leveraged in real-time to improve network performance, service reliability, and the support experience.
Reduce network latency by processing more data at each site instead of transmitting it to the cloud or data center for analysis.
Lower CAPEX and OPEX at each site by consolidating the tech stack and automating management workflows with AIOps.
Prevent downtime with real-time analysis of environmental and equipment monitoring data to catch problems before they escalate.
Accelerate recovery with real-time, AIOps root-cause analysis and simple incident remediation that continues functioning even if the site is cut off from the WAN or Internet.

Management infrastructure isolation, which is recommended by CISA and required by regulations like DORA, is the best practice for improving edge resilience and ensuring a speedy recovery from failures and breaches. Isolated management infrastructure (IMI) prevents compromised accounts, ransomware, and other threats from moving laterally from production resources to the interfaces used to control critical network infrastructure.

To ensure the scalability and flexibility of edge architectures, the best practice is to use vendor-neutral platforms to host, connect, and secure edge applications and workloads. Moving away from dedicated device stacks and taking a “platformization” approach allows organizations to easily deploy, update, and swap out functions and services on demand. For example, Nodegrid edge networking solutions have a Linux-based OS that supports third-party VMs, Docker containers, and NFVs. Telecom companies can use Nodegrid to run edge computing workloads as well as asset management software, customer experience analytics, AIOps, and edge security solutions like SASE.

Vendor-neutral platforms help reduce hardware overhead costs to deploy new edge sites, make it easy to spin-up new NFVs to meet increased demand, and allow telecom organizations to explore different edge software capabilities without costly hardware upgrades. For example, the Nodegrid Gate SR is available with an Nvidia Jetson Nano card that’s optimized for AI workloads, so companies can run innovative artificial intelligence at the edge alongside networking and infrastructure management workloads rather than purchasing expensive, dedicated GPU resources.

Finally, to ensure teams have holistic oversight of the distributed edge computing architecture, the best practice is to use a centralized, cloud-based edge management and orchestration (EMO) platform. This platform should also be vendor-neutral to ensure complete coverage and should use out-of-band management to provide continuous management access to edge infrastructure even during a major service outage.

Streamlined, cost-effective edge computing with Nodegrid

Nodegrid’s flexible, vendor-neutral platform adapts to all edge computing use cases in telecom. Watch a demo to see Nodegrid’s telecom solutions in action.

Watch a demo

The post Edge Computing Use Cases in Telecom appeared first on ZPE Systems.

Edge Computing Use Cases in Retail

ZPE Systems — Thu, 25 Jul 2024 21:01:34 +0000

Retail organizations must constantly adapt to meet changing customer expectations, mitigate external economic forces, and stay ahead of the competition. Technologies like the Internet of Things (IoT), artificial intelligence (AI), and other forms of automation help companies improve the customer experience and deliver products at the pace demanded in the age of one-click shopping and two-day shipping. However, connecting individual retail locations to applications in the cloud or centralized data center increases network latency, security risks, and bandwidth utilization costs.

Edge computing mitigates many of these challenges by decentralizing cloud and data center resources and distributing them at the network’s “edges,” where most retail operations take place. Running applications and processing data at the edge enables real-time analysis and insights and ensures that systems remain operational even if Internet access is disrupted by an ISP outage or natural disaster. This blog describes five potential edge computing use cases in retail and provides more information about the benefits of edge computing for the retail industry.

5 Edge computing use cases in retail

1. Security video analysis

Security cameras are crucial to loss prevention, but constantly monitoring video surveillance feeds is tedious and difficult for even the most experienced personnel. AI-powered video surveillance systems use machine learning to analyze video feeds and detect suspicious activity with greater vigilance and accuracy. Edge computing enhances AI surveillance by allowing solutions to analyze video feeds in real-time, potentially catching shoplifters in the act and preventing inventory shrinkage.

2. Localized, real-time insights

Retailers have a brief window to meet a customer’s needs before they get frustrated and look elsewhere, especially in a brick-and-mortar store. A retail store can use an edge computing application to learn about customer behavior and purchasing activity in real-time. For example, they can use this information to rotate the products featured on aisle endcaps to meet changing demand, or staff additional personnel in high-traffic departments at certain times of day. Stores can also place QR codes on shelves that customers scan if a product is out of stock, immediately alerting a nearby representative to provide assistance.

3. Enhanced inventory management

Effective inventory management is challenging even for the most experienced retail managers, but ordering too much or too little product can significantly affect sales. Edge computing applications can improve inventory efficiency by making ordering recommendations based on observed purchasing patterns combined with real-time stocking updates as products are purchased or returned. Retailers can use this information to reduce carrying costs for unsold merchandise while preventing out-of-stocks, improving overall profit margins.
.

Learn more about edge computing

4. Building management

Using IoT devices to monitor and control building functions such as HVAC, lighting, doors, power, and security can help retail organizations reduce the need for on-site facilities personnel, and make more efficient use of their time. Data analysis software helps automatically optimize these systems for efficiency while ensuring a comfortable customer experience. Running this software at the edge allows automated processes to respond to changing conditions in real-time, for example, lowering the A/C temperature or routing more power to refrigerated cases during a heatwave.

5. Warehouse automation

The retail industry uses warehouse automation systems to improve the speed and efficiency at which goods are delivered to stores or directly to users. These systems include automated storage and retrieval systems, robotic pickers and transporters, and automated sortation systems. Companies can use edge computing applications to monitor, control, and maintain warehouse automation systems with minimal latency. These applications also remain operational even if the site loses internet access, improving resilience.

The benefits of edge computing for retail

The benefits of edge computing in a retail setting include:
.

Edge computing benefits	Description
Reduced latency	Edge computing decreases the number of network hops between devices and the applications they rely on, reducing latency and improving the speed and reliability of retail technology at the edge.
Real-time insights	Edge computing can analyze data in real-time and provide actionable insights to improve the customer experience before a sale is lost or reduce waste before monthly targets are missed.
Improved resilience	Edge computing applications can continue functioning even if the site loses Internet or WAN access, enabling continuous operations and reducing the costs of network downtime.
Risk mitigation	Keeping sensitive internal data like personnel records, sales numbers, and customer loyalty information on the local network mitigates the risk of interception and distributes the attack surface.

Edge computing can also help retail companies lower their operational costs at each site by reducing bandwidth utilization on expensive MPLS links and decreasing expenses for cloud data storage and computing. Another way to lower costs is by using consolidated, vendor-neutral solutions to run, connect, and secure edge applications and workloads.

For example, the Nodegrid Gate SR integrated branch services router delivers an entire stack of edge networking, infrastructure management, and computing technologies in a single, streamlined device. The open, Linux-based Nodegrid OS supports VMs and Docker containers for third-party edge computing applications, security solutions, and more. The Gate SR is also available with an Nvidia Jetson Nano card that’s optimized for AI workloads to help retail organizations reduce the hardware overhead costs of deploying artificial intelligence at the edge.

Consolidated edge computing with Nodegrid

Nodegrid’s flexible, scalable platform adapts to all edge computing use cases in retail. Watch a demo to see Nodegrid’s retail network solutions in action.

Watch a demo

The post Edge Computing Use Cases in Retail appeared first on ZPE Systems.

Why Securing IT Means Replacing End-of-Life Console Servers

Jordan Baker — Thu, 25 Jul 2024 18:56:28 +0000

The world as we know it is connected to IT, and IT relies on its underlying infrastructure. Organizations must prioritize maintaining this infrastructure; otherwise, any disruption or breach has a ripple effect that takes services offline for millions of users (take the recent CrowdStrike outage, for example). A big part of this maintenance is ensuring that all hardware components, including console servers, are up-to-date and secure. Most console servers reach end-of-life (EOL) and need to be replaced, but for many reasons, whether budgetary concerns or the “if it isn’t broken” mentality, IT teams often keep their EOL devices. Let’s look at the risks of using EOL console servers, and why replacing them goes hand-in-hand with securing IT.

The Risks of Using End-of-Life Console Servers

End-of-life console servers can undermine the security and functionality of IT systems. These risks include:

1. Lack of Security Features and Updates

Aging console servers lack adequate hardware and management security features, meaning they can’t support a zero trust approach. On top of this, once a console server reaches EOL, the manufacturer stops providing security patches and updates. The device then becomes vulnerable to newly discovered CVEs and complex cyberattacks (like the MOVEit and Ragnar Locker breaches). Cybercriminals often target outdated hardware because they know that these devices are no longer receiving updates, making them easy entry points for launching attacks.

2. Compliance Issues

Many industries have stringent regulatory requirements regarding data security and IT infrastructure. DORA, NIS2 (EU), NIST2 (US), PCI 4.0 (finance), and CER Directive are just a few of the updated regulations that are cracking down on how organizations architect IT, including the management layer. Using EOL hardware can lead to non-compliance, resulting in fines and legal repercussions. Regulatory bodies expect organizations to use up-to-date and secure equipment to protect sensitive information.

3. Prolonged Recovery

EOL console servers are prone to failures and inefficiencies. As these devices age, their performance deteriorates, leading to increased downtime and disruptions. Most console servers are Gen 2, meaning they offer basic remote troubleshooting (to address break/fix scenarios) and limited automation capabilities. When there is a severe disruption, such as a ransomware attack, hackers can easily access and encrypt these devices to lock out admin access. Organizations then must endure prolonged recovery (just look the still ongoing CrowdStrike outage, or last year’s MGM attack) because they need to physically decommission and restore their infrastructure.

The Importance of Replacing EOL Console Servers

Here’s why replacing EOL console servers is essential to securing IT:

1. Modern Security Approach

Zero trust is an approach that uses segmentation across IT assets. This ensures that only authorized users can access resources necessary for their job function. This approach requires SAML, SSO, MFA/2FA, and role-based access controls, which are only supported by modern console servers. Modern devices additionally feature advanced security through encryption, signed OS, and tampering detection. This ensures a complete cyber and physical approach to security.

2. Protection Against New Threats

New CVEs and evolving threats can easily take advantage of EOL devices that no longer receive updates. Modern console servers benefit from ongoing support in the form of firmware upgrades and security patches. Upgrading with a security-focused device vendor can drastically shrink the attack surface, by addressing supply chain security risks, codebase integrity, and CVE patching.

3. Ease of Compliance

EOL devices lack modern security features, but this isn’t the only reason why they make it difficult or impossible to comply with regulations. They also lack the ability to isolate the control plane from the production network (see Diagram 1 below), meaning attackers can easily move between the two in order to launch ransomware and steal sensitive information. Watchdog agencies and new legislation are stipulating that organizations follow the latest best practice of separating the control plane from production, called Isolated Management Infrastructure (IMI). Modern console servers make this best practice simple to achieve by offering drop-in out-of-band that is completely isolated from production assets (see Diagram 2 below). This means that the organization is always in control of its IT assets and sensitive data.

Diagram 1: Though an acceptable approach, Gen 2 out-of-band lacks isolation and leaves management interfaces vulnerable to the internet.

Diagram 2: Gen 3 out-of-band fully isolates the control plane to guarantee organizations retain control of their IT assets and sensitive info.

4. Faster Recovery

New console servers are designed to handle more workloads and functions, which eliminates single-purpose devices and shrinks the attack surface. They can also run VMs and Docker containers to host applications. This enables what Gartner calls the Isolated Recovery Environment (IRE) (see Diagram 3 below), which is becoming essential for faster recovery from ransomware. Since the IMI component prohibits attackers from accessing the control plane, admins retain control during an attack. They can use the IMI to deploy their IRE and the necessary applications — remotely — to decommission, cleanse, and restore their infected infrastructure. This means that they don’t have to roll trucks week after week when there’s an attack; they just need to log into their management infrastructure to begin assessing and responding immediately, which significantly reduces recovery times.

Diagram 3: The Isolated Recovery Environment allows for a comprehensive and rapid response to ransomware attacks.

Watch How To Secure The Network Backbone

I recently presented at Cisco Live Vegas on how to secure the network’s backbone using Isolated Management Infrastructure. I walk you through the evolution of network management, and it becomes obvious that end-of-life console servers are a major security concern, both from the hardware perspective itself and their lack of isolation capabilities. Watch my 10-minute presentation from the show and download some helpful resources, including the blueprint to building IMI.

Watch My Presentation

The post Why Securing IT Means Replacing End-of-Life Console Servers appeared first on ZPE Systems.

The CrowdStrike Outage: How to Recover Fast and Avoid the Next Outage

Jordan Baker — Tue, 23 Jul 2024 13:22:34 +0000

On July 19, 2024, CrowdStrike, a leading cybersecurity firm renowned for its advanced endpoint protection and threat intelligence solutions, experienced a significant outage that disrupted operations for many of its clients. This outage, triggered by a software upgrade, resulted in crashes for Windows PCs, creating a wave of operational challenges for banks, airports, enterprises, and organizations worldwide. This blog post explores what transpired during this incident, what caused the outage, and the broader implications for the cybersecurity industry.

What happened?

The incident began on the morning of July 19, 2024, when numerous CrowdStrike customers started reporting issues with their Windows PCs. Users experienced the BSOD (blue screen of death), which is when Windows crashes and renders devices unusable. As the day went on, it became evident that the problem was widespread and directly linked to a recent software upgrade deployed by CrowdStrike.

Timeline of Events

Initial Reports: Early in the day, airports, hospitals, and critical infrastructure operators began experiencing unexplained crashes on their Windows PCs. The issue was quickly reported to CrowdStrike’s support team.
Incident Acknowledgement: CrowdStrike acknowledged the issue via their social media channels and direct communications with affected clients, confirming that they were investigating the cause of the crashes.
Root Cause Analysis: CrowdStrike’s engineering team worked diligently to identify the root cause of the problem. They soon determined that a software upgrade released the previous night was responsible for the crashes.
Mitigation Efforts: Upon isolating the faulty software update, CrowdStrike issued guidance on how to roll back the update and provided patches to fix the issue.

What caused the CrowdStrike outage?

The root cause of the outage was a software upgrade intended to enhance the functionality and security of CrowdStrike’s Falcon sensor endpoint protection platform. However, this upgrade contained a bug that conflicted with certain configurations of Windows PCs, leading to system crashes. Several factors contributed to the incident:

Insufficient Testing: The software update did not undergo adequate testing across all possible configurations of Windows PCs. This oversight meant that the bug was not detected before the update was deployed to customers.
Complex Interdependencies: The incident highlights the complex interdependencies between software components and operating systems. Even minor changes can have unforeseen impacts on system stability.
Rapid Deployment: In the cybersecurity industry, quick responses to emerging threats are crucial. However, the pressure to deploy updates rapidly can sometimes lead to insufficient testing and quality assurance processes.

We need to remember one important fact: whether software is written by humans or AI, there will be mistakes in coding and testing. When an issue slips through the cracks, the customer lab is the last resort to catch it. Usually, this can be done with a controlled rollout, where the IT team first upgrades their lab equipment, performs further testing, puts in place a rollback plan, and pushes the update to a less critical site. But in a cloud-connected SaaS world, the customer is no longer in control. That’s why they sign waivers stating that if such an incident occurs, the company that caused the problem is not liable. Experts are saying the only way to address this challenge is to have an infrastructure that’s designed, deployed, and operated for resilience. We discuss this architecture further down in this article.

How to recover from the CrowdStrike outage

CrowdStrike gives two options for recovering:

Option 1: Reboot in Safe Mode – Reboot the affected device in Safe Mode, locate and delete the file “C-00000291*.sys”, and then restart the device.
Option 2: Re-image – Download and configure the recovery utility to create a new Windows image, add this image to a USB drive, and then insert this USB drive into the target device. The utility will automatically find and delete the file that’s causing the crash.

The biggest obstacle that is costing organizations a lot of time and money is that with either of these recovery methods, IT staff need to be physically present to work on each affected device. They need to go one by one manually remediating via Safe Mode or physically inserting the USB drive. What makes this more difficult is that many organizations use physical and software/management security controls to limit access. Locked device cabinets slow down physical access to devices, and things like role-based access policies and disk encryption can make Safe Mode unusable. Because this outage is affecting more than 8.5 million computers, this kind of work won’t scale efficiently. That’s why organizations are turning to Isolated Management Infrastructure (IMI) and the Isolated Recovery Environment (IRE).

How IMI and IRE help you recover faster

IMI is a dedicated control plane network that’s meant for administration and recovery of IT systems, including Windows PCs affected by the CrowdStrike outage. It uses the concept of out-of-band management, where you deploy a management device that is connected to dedicated management ports of your IT infrastructure (e.g., serial ports, IPMI ports, and other ethernet management ports). IMI also allows you to deploy recovery services for your digital estate that is immutable and near-line when recovery needs to take place.

IMI does not rely at all on the production assets, as it has its own dedicated remote access via WAN links like 4G/5G, and can contain and encrypt recovery keys and tools with zero trust.

IMI gives teams remote, low-level access to devices so they can recover their systems remotely without the need to visit sites. Organizations that employ IMI are able to revert back to a golden image through automation, or deploy bootable tools to all the computers at the site to rescue them without data loss.

The dedicated out-of-band access to serial/IPMI and management ports gives automation software the same abilities as if a physical crash cart was pulled up to the servers. ZPE Systems’ Nodegrid (now a brand of Legrand) enables this architecture as explained next. Using Nodegrid and ZPE Cloud, teams can use either option to recover from the CrowdStrike outage:

Option 1: Reboot in Pre-Execution Environment Software – Nodegrid gives low-level network access to connected Windows as if teams were sitting directly in front of the affected device. This means they can remote-in, reboot to a network image, remote into the booted image, delete the faulty file, and restart the system.
Option 2: Re-image – ZPE Cloud serves as a file repository and orchestration engine. Teams can upload their working Windows image, and then automatically push this across their global fleet of affected devices. This option speeds up recovery times exponentially.
Option 3 – Run Windows Deployment server on the IMI device at the location and re-image servers and workstations if a good backup of the data has been located. This backup can be made available through the IMI after the initial image has been deployed. The IMI can provide dedicated secure access to the InTune services in your M365 cloud, and the backups do not have to transit the entire internet for all workstations at the time, speeding up recovery many times over.

All of these options can be performed at scale or even automated. Server recovery with large backups, although it may take a couple of hours, can be delivered locally and tracked for performance and consistency.

But what about the risk of making mistakes when you have to repeat these tasks? Won’t this cause more damage and data loss?

Any team can make a mistake repeating these recovery tasks over a large footprint, and cause further damage or loss of data, slowing the recovery further. Automated recovery through the IMI addresses this, and can provide reliable recording and reporting to ensure that the restoration is complete and trusted.

What does IMI look like?

Here’s a simplified view of Isolated Management Infrastructure. You can see that ZPE’s Nodegrid device is needed, which sits beside production infrastructure and provides the platform for hosting all the tools necessary for fast recovery.

What you need to deploy IMI for recovery:

Out-of-band appliance with serial, USB, ethernet interfaces (e.g., ZPE’s Nodegrid Net SR)
Switchable PDU: Legrand Server Tech or Raritan PDU
Windows PXE Boot image

Here’s the order of operations for a faster CrowdStrike outage recovery:

Option 1 – Recover

1. IMI deployed with a ZPE Nodegrid device that will start Pre-Execution Environment (PXE) which are Windows boot images that the Nodegrid will push to the computers when they boot up
2. Send recovery keys from Intune to IMI remote storage over ZPE Cloud’s zero trust platform easily available in cloud or air-gapped through Nodegrid Manager
3. Enable PXE service (automated across entire enterprise) and define the PXE recovery image
4. Use serial or IP control of power to the computers, or if possible Intel vPro or IPMI capable machines, to reboot all machines
5. All machines will boot and check in to a control tower for PXE, or be made available to remote into using stored passwords on the PXE environment, Windows AD, or other Privileged Access Management (PAM)
6. Delete Files
7. Reboot

Option 2 – Lean re-image

1. IMI deployed with a Windows Pre-Execution boot image running PXE service
2. Enable access to cloud and Azure Intune to the IMI remote storage for the local image for the PC
3. Enable PXE service (automated across entire enterprise) and define the PXE recovery image
4. Use serial or IP control of power to the computers, or if possible, Intel vPro or IPMI capable machines, to reboot all machines
5. Machines will boot and check in to Intune either through the IMI or through normal Internet access and finish imaging
6. Once the machine completes the InTune tasks, InTune will signal backups to come down to the machines. If these backups are offsite, they can be staged on the IMI through backup software running on a virtual machine located on the IMI appliance to speed up recovery and not impede the Internet connection at the remote site
7. Pre-stage backups onto local storage, push recovery from the virtual machine on the IMI

Option 3 – Windows controlled re-image

1. Windows Deployment Server (WDS) installed as a virtual machine running on the IMI appliance (offline to prevent issues or online but under a slowed deployment cycle in case there was an issue)
2. Send recovery keys from Intune to IMI remote storage over a zero trust interface in cloud or air-gapped
3. Use serial or IP control of power to the computers, or if possible, Intel vPro or IPMI capable machines, to reboot all machines
4. Machines will boot and check in to the WDS for re-imaging
5. Machines will boot and check in to Intune either through the IMI or through normal Internet access and finish imaging
6. Once the machine completes the InTune tasks, InTune will signal backups to come down to the machines. If these backups are offsite, they can be staged on the IMI through backup software running on a virtual machine located on the IMI appliance to speed up recovery and not impede the Internet connection at the remote site
7. Pre-stage backups onto local storage, push recovery from the virtual machine on the IMI

Deploy IMI now to recover and avoid the next outage

Get in touch for help choosing the right size IMI deployment for your organization. Nodegrid and ZPE Cloud are the drop-in solution to recovering from the CrowdStrike outage, with plenty of device options to fit any budget and environment size. Contact ZPE Sales now or download the blueprint to help you begin implementing IMI.

Contact ZPE sales

Download blueprint

The post The CrowdStrike Outage: How to Recover Fast and Avoid the Next Outage appeared first on ZPE Systems.

Improving Your Zero Trust Security Posture

ZPE Systems — Tue, 16 Jul 2024 19:51:31 +0000

The current cyber threat landscape is daunting, with attacks occurring so frequently that security experts recommend operating under the assumption that your network is already breached. Major cyber attacks – and the disruptions they cause – frequently make news headlines. For example, the recent LendingTree breach exposed consumer data, which could affect the company’s reputation and compliance status. An attack on auto dealership software company CDK Global took down the platform and disrupted business for approximately 15,000 car sellers – an outage that’s still ongoing as of this article’s writing.

The zero trust security methodology outlines the best practices for limiting the blast radius of a successful breach by preventing malicious actors from moving laterally through the network and accessing the most valuable or sensitive resources. Many organizations have already begun their zero trust journey by implementing role-based access controls (RBAC), multi-factor authentication (MFA), and other security solutions, but still struggle with coverage gaps that result in ransomware attacks and other disruptive breaches. This blog provides advice for improving your zero trust security posture with a multi-layered strategy that mitigates weaknesses for complete coverage.

How to improve your zero trust security posture

Strategy	Description
Gain a full understanding of your protect surface	Use automated discovery tools to identify all the data, assets, applications, and services that an attacker could potentially target.
Micro-segment your network with micro-perimeters	Implement specific policies, controls, and trust verification mechanisms to mitigate and protect surface vulnerabilities.
Isolate and defend your management infrastructure	Use OOB management and hardware security to prevent attackers from compromising the control plane.
Defend your cloud resources	Understand the shared responsibility model and use cloud-specific tools like a CASB to prevent shadow IT and enforce zero trust.
Extend zero trust to the edge	Use edge-centric solutions like SASE to extend zero trust policies and controls to remote network traffic, devices, and users.

Gain a full understanding of your protect surface

Many security strategies focus on defending the network’s “attack surface,” or all the potential vulnerabilities an attacker could exploit to breach the network. However, zero trust is all about defending the “protect surface,” or all the data, assets, applications, and services that an attacker could potentially try to access. The key difference is that zero trust doesn’t ask you to try to cover any possible weakness in a network, which is essentially impossible. Instead, it wants you to look at the resources themselves to determine what has the most value to an attacker, and then implement security controls that are tailored accordingly.

Gaining a full understanding of all the resources on your network can be extraordinarily challenging, especially with the proliferation of SaaS apps, mobile devices, and remote workforces. There are automated tools that can help IT teams discover all the data, apps, and devices on the network. Application discovery and dependency mapping (ADDM) tools help identify all on-premises software and third-party dependencies; cloud application discovery tools do the same for cloud-hosted apps by monitoring network traffic to cloud domains. Sensitive data discovery tools scan all known on-premises or cloud-based resources for personally identifiable information (PII) and other confidential data, and there are various device management solutions to detect network-connected hardware, including IoT devices.
,

Tip: This step can’t be completed one time and then forgotten – teams should execute discovery processes on a regular, scheduled basis to limit gaps in protection.

Micro-segment your network with micro-perimeters

Micro-segmentation is a cornerstone of zero-trust networks. It involves logically separating all the data, applications, assets, and services according to attack value, access needs, and interdependencies. Then, teams implement granular security policies and controls tailored to the needs of each segment, establishing what are known as micro-perimeters. Rather than trying to account for every potential vulnerability with one large security perimeter, teams can just focus on the tools and policies needed to cover the specific vulnerabilities of a particular micro-segment.

Network micro-perimeters help improve your zero trust security posture with:

Granular access policies granting the least amount of privileges needed for any given workflow. Limiting the number of accounts with access to any given resource, and limiting the number of privileges granted to any given account, significantly reduces the amount of damage a compromised account (or malicious actor) is capable of inflicting.

Targeted security controls addressing the specific risks and vulnerabilities of the resources in a micro-segment. For example, financial systems need stronger encryption, strict data governance monitoring, and multiple methods of trust verification, whereas an IoT lighting system requires simple monitoring and patch management, so the security controls for these micro-segments should be different.

Trust verification using context-aware policies to catch accounts exhibiting suspicious behavior and prevent them from accessing sensitive resources. If a malicious outsider compromises an authorized user account and MFA device – or a disgruntled employee uses their network privileges to harm the company – it can be nearly impossible to prevent data exposure. Context-aware policies can stop a user from accessing confidential resources outside of typical operating hours, or from unfamiliar IP addresses, for example. Additionally, user entity and behavior analytics (UEBA) solutions use machine learning to detect other abnormal and risky behaviors that could indicate malicious intent.

Isolate and defend your management infrastructure

For zero trust to be effective, organizations must apply consistently strict security policies and controls to every component of their network architecture, including the management interfaces used to control infrastructure. Otherwise, a malicious actor could use a compromised sysadmin account to hijack the control plane and bring down the entire network.

According to a recent CISA directive, the best practice is to isolate the network’s control plane so that management interfaces are inaccessible from the production network. Many new cybersecurity regulations, including PCI DSS 4.0, DORA, NIS2, and the CER Directive, also either strongly recommend or require management infrastructure isolation.

Isolated management infrastructure (IMI) prevents compromised accounts, ransomware, and other threats from moving laterally to or from the production LAN. It gives teams a safe environment to recover from ransomware or other cyberattacks without risking reinfection, which is known as an isolated recovery environment (IRE). Management interfaces and the IRE should also be protected by granular, role-based access policies, multi-factor authentication, and strong hardware roots of trust to further mitigate risk.

The easiest and most secure way to implement IMI is with Gen 3 out-of-band (OOB) serial console servers, like the Nodegrid solution from ZPE Systems. These devices use alternative network interfaces like 5G/4G LTE cellular to ensure complete isolation and 24/7 management access even during outages. They’re protected by hardware security features like TPM 2.0 and GPS geofencing, and they integrate with zero trust solutions like identity and access management (IAM) and UEBA to enable consistent policy enforcement.

Defend your cloud resources

The vast majority of companies host some or all of their workflows in the cloud, which significantly expands and complicates the attack surface while making it more challenging to identify and defend the protect surface. Some organizations also lack a complete understanding of the shared responsibility model for varying cloud services, increasing the chances of coverage gaps. Additionally, many orgs struggle with “shadow IT,” which occurs when individual business units implement cloud applications without going through onboarding, preventing security teams from applying zero trust controls.

The first step toward improving your zero trust security posture in the cloud is to ensure you understand where your cloud service provider’s responsibilities end and yours begin. For instance, most SaaS providers handle all aspects of security except IAM and data protection, whereas IaaS (Infrastructure-as-a-Service) providers are only responsible for protecting their physical and virtual infrastructure.

It’s also vital that security teams have a complete picture of all the cloud services in use by the organization and a way to deploy and enforce zero trust policies in the cloud. For example, a cloud access security broker (CASB) is a solution that discovers all the cloud services in use by an organization and allows teams to monitor and manage security for the entire cloud architecture. A CASB provides capabilities like data governance, malware detection, and adaptive access controls, so organizations can protect their cloud resources with the same techniques used in the on-premises environment.
.

Example Cloud Access Security Broker Capabilities

Visibility

Compliance

Threat protection

Data security

Cloud service discovery

Monitoring and reporting

User authentication and authorization

Data governance and loss prevention

Malware (e.g., virus, ransomware) detection

User and entity behavior analytics (UEBA)

Data encryption and tokenization

Data leak prevention

Extend zero trust to the edge

Modern enterprise networks are highly decentralized, with many business operations taking place at remote branches, Internet of Things (IoT) deployment sites, and end-users’ homes. Extending security controls to the edge with on-premises zero trust solutions is very difficult without backhauling all remote traffic through a centralized firewall, which creates bottlenecks that affect performance and reliability. Luckily, the market for edge security solutions is rapidly growing and evolving to help organizations overcome these challenges.

Security Access Service Edge (SASE) is a type of security platform that delivers core capabilities as a managed, typically cloud-based service for the edge. SASE uses software-defined wide area networking (SD-WAN) to intelligently and securely route edge traffic through the SASE tech stack, allowing the application and enforcement of zero trust controls. In addition to CASB and next-generation firewall (NGFW) features, SASE usually includes zero trust network access (ZTNA), which offers VPN-like functionality to connect remote users to enterprise resources from outside the network. ZTNA is more secure than a VPN because it only grants access to one app at a time, requiring separate authorization requests and trust verification attempts to move to different resources.

Accelerating the zero trust journey

Zero trust is not a single security solution that you can implement once and forget about – it requires constant analysis of your security posture to identify and defend weaknesses as they arise. The best way to ensure adaptability is by using vendor-agnostic platforms to host and orchestrate zero trust security. This will allow you to add and change security services as needed without worrying about interoperability issues.

For example, the Nodegrid platform from ZPE Systems includes vendor-neutral serial consoles and integrated branch services routers that can host third-party software such as SASE and NGFWs. These devices also provide Gen 3 out-of-band management for infrastructure isolation and network resilience. Nodegrid protects management interfaces with strong hardware roots-of-trust, embedded firewalls, SAML 2.0 integrations, and other zero trust security features. Plus, with Nodegrid’s cloud-based or on-premises management platform, teams can orchestrate networking, infrastructure, and security workflows across the entire enterprise architecture.

Improve your zero trust security posture with Nodegrid

Using Nodegrid as the foundation for your zero trust network infrastructure ensures maximum agility while reducing management complexity. Watch a Nodegrid demo to learn more.

Schedule a Demo

The post Improving Your Zero Trust Security Posture appeared first on ZPE Systems.

DORA Act: 5 Takeaways For The Financial Sector

Jordan Baker — Thu, 07 Mar 2024 18:57:50 +0000

The Digital Operational Resilience Act (DORA) is a regulatory initiative within the European Union that aims to enhance the operational resilience of the financial sector. Its main goal is to prevent and mitigate cyber threats and operational disruptions. The DORA Act outlines regulatory requirements for the security of network and information systems “whereby all firms need to make sure they can withstand, respond to and recover from all types of ICT-related disruptions and threats” (DORA Act website).

Who and What Are Covered Under the DORA Act?

The DORA Act is a regulation that covers all financial entities within the European Union (EU). It recognizes the critical role of information and communication technology (ICT) systems in financial services. DORA applies to financial services including payments, securities, credit rating, algorithmic trading, lending, insurance, and back-office operations. It establishes a framework for ICT risk management through technical standards, which are being released in two phases, the first of which was published on January 17, 2024. The DORA Act will go into effect in its entirety on January 17, 2025.

With cyberattacks constantly in the news cycle, it’s no surprise that governing bodies are putting forth standards for operational resilience. But without combing through this lengthy piece of legislation, what should IT teams start thinking about from a practical standpoint? Here are 5 takeaways on what the DORA Act means for the financial sector.

DORA Act: 5 Takeaways for the Financial Sector

1. Shore-up your cybersecurity measures

The DORA Act emphasizes strengthening cybersecurity measures within the financial sector. It requires financial institutions, such as banks, stock exchanges, and financial infrastructure providers, to implement robust cybersecurity controls and protocols. These include adopting advanced authentication mechanisms, encryption standards, and network segmentation to protect sensitive financial data and critical infrastructure from cyber threats. Part of this will also require organizations to apply system patches and updates in a timely manner, which means automated patching will become necessary to every organization’s security posture.

2. Implement resilience systems

Operational resilience is a key focus area of the DORA Act, aiming to ensure the continuity of essential financial services in the face of cyber threats, natural disasters, and other operational disruptions. Financial institutions are required to develop comprehensive business continuity plans, establish redundant systems and backup facilities, and conduct regular stress tests to assess their ability to withstand and recover from various scenarios. Implementing a resilience system helps with this, as it provides all the infrastructure, tools, and services necessary to continue operating during major incidents.

3. Conduct regular scans for vulnerabilities

The DORA Act mandates financial institutions to implement robust risk management practices to identify, assess, and mitigate cyber risks and operational vulnerabilities. This includes conducting regular assessments, vulnerability scans, and penetration tests, and developing incident response procedures to quickly address threats. This is all part of taking a proactive approach to identify and mitigate cyber incidents, and reduce the impact that adverse events have on financial stability and consumer confidence.

4. Collaborate and share information with industry peers

The DORA Act encourages financial institutions to share cybersecurity threat intelligence, incident data, and best practices with industry peers, regulators, and law enforcement agencies. The ability to monitor systems and collect data will be crucial to this approach, and will require systems that can rapidly (and securely) deploy apps/services during ongoing incidents. This will help financial institutions to better understand emerging threats, coordinate responses to cyber incidents, and strengthen collective defenses against threats and operational disruptions.

5. Segment physical and logical systems to pass regular audits

Through the DORA Act, regulators are empowered to conduct regular assessments, audits, and inspections of systems. This will ensure that financial institutions are implementing adequate controls and safeguards to protect against cyber threats and operational disruptions. A crucial part to this will involve physical and logical separation of systems, such as through Isolated Management Infrastructure, as well as implementing zero trust architecture across the organization. These will help bolster resilience by eliminating control dependencies between management and production networks, which will also help to streamline audits.

Get the blueprint to help you comply with the DORA Act

DORA’s requirements are meant to help IT teams better protect sensitive data and the integrity of financial systems as a whole. But without a proper network management infrastructure, their production networks are too sensitive to errors and vulnerable to attacks. ZPE has created the blueprint that covers these 5 crucial takeaways outlined in the DORA Act. The architecture outlined in this blueprint has been trusted by Big Tech for more than a decade, as it allows them to deploy modern cybersecurity measures, physically and logically separated systems, and rapid recovery processes. Download the blueprint now.

Download blueprint

The post DORA Act: 5 Takeaways For The Financial Sector appeared first on ZPE Systems.

What to do if You’re Ransomware’d: A Healthcare Example

Jordan Baker — Wed, 28 Feb 2024 20:09:00 +0000

This article was written by James Cabe, CISSP, a 30-year cybersecurity expert who’s helped major companies including Microsoft and Fortinet.

Ransomware gangs target the innocent and vulnerable. They hit a Chicago hospital in December 2023, a London hospital in October the same year, and schools and hospitals in New Jersey as recently as January 2024. This is one of the biggest reasons I’m committed to stopping these criminals by educating organizations on how to re-think and re-architect their approach to cybersecurity.

In previous articles, I discussed IMI (Isolated Management Infrastructure) and IRE (Isolated Recovery Environments), and how they could have quickly altered outcomes for MGM, Ragnar Locker victims, and organizations affected by the MOVEit vulnerability. Using IMI and IRE, organizations find that the key to not only speedy recovery, but also to limiting the blast radius and attack persistence, is isolation.

Why is isolation (not segmentation) key to ransomware recovery?

The NIST framework for incident response has five steps: Identify, Protect, Detect, Respond, and Recover. It’s missing a crucial step, however: Isolate. Stay tuned for a full breakdown of this in my next article. But the reason this is so critical is because attacks move at machine speed, and are very pervasive and persistent. If your management network is not fully isolated from production assets, the infection spreads to everything. Suddenly, you’re locked out completely and looking at months of tedious recovery. For healthcare providers, this jeopardizes everything from patient care to regulatory compliance.

Isolation is integral to building a resilience system, or in other words, a system that gives you more than basic serial console/out-of-band access and instead provides an entire infrastructure dedicated to keeping you in control of your systems — be it during a ransomware attack, ISP outage, natural disaster, etc. Because this infrastructure is physically and virtually isolated from production (no dependencies on production switches/routers, no open management ports, etc.), it’s nearly impossible for attackers to lock you out.

So, what really should you do if you’re ransomware’d? Let’s walk through an example attack on a healthcare system, and compare the traditional DR (Disaster Recovery) response to the IMI/IRE approach.

Ransomware in Healthcare: Disaster Recovery vs Isolated Recovery

Suppose you’re in charge of a hospital’s network. MDIoT, patient databases, and DICOM storage are the crown jewels of your infrastructure. Suddenly, you discover ransomware has encrypted patient records and is likely spreading quickly to other crown jewel assets. The risks and potential fallout can’t be understated. Millions of people are depending on you to protect their sensitive info, while the hospital is depending on you to help them avoid regulatory/legal penalties and ensure they can continue operating.

The problem with Disaster Recovery

Though the word ‘recovery’ is in the name, the DR approach is limited in its capacity to recover systems during an attack. Disaster Recovery typically employs a couple things:

Backups, which are copies of data, configurations, and code that are used to restore a production system when it fails.
Redundancy, which involves duplicating critical systems, services, and applications as a failsafe in the event that primaries go down (think cellular failover devices, secondary firewalls, etc.).

What happens when you activate your DR processes? It’s highly likely that you won’t be able to, and that’s because the typical DR setup relies on the production network. There’s no isolation.

Think about it this way: your backup servers need direct access to the data they’re backing up. If your file servers get pwned, your backup servers will, too. If your primary firewall gets hacked, your secondary will, too. The problem with backup and redundancy systems — and any system, for that matter — is that when they depend on the underlying infrastructure to remain operational, they’re just as susceptible to outages and attacks. It’s like having a reserve parachute that depends on the main parachute.

And what about the rest of your systems? You just discovered the attack has encrypted your servers and is quickly bringing operations to a crawl. How are you going to get in and fight back? What if you try to log into your management network, only to find that you’re locked out? All of your tools, configurations, and capabilities have been compromised.

This is why CISA, the FBI, US Navy, and other agencies recommend implementing Isolated Management Infrastructure.

IMI and IRE guarantee you can fight back against ransomware

You discover that the ransomware has spread. Not only has it encrypted data and stopped operations, but it has also locked you out of your own management network and is affecting the software configurations throughout the hospital. This is where IMI (Isolated Management Infrastructure) and IRE (Isolated Recovery Environment) come in.

Because IMI is physically separate from affected systems, it guarantees management access so teams can set up communication and a temporary ‘war room’ for incident response. The IRE can then be created using a combination of cellular, compute, connectivity, and power control (see diagram for design and steps). Docker containers should be used to bring up each step.

Image: The infrastructure and incident response protocol involved in the Isolated Recovery Environment. These products were chosen from free or open source projects that have proven to be very useful in each of these stages of recovery. These can be automated in pieces for each phase, and then be brought down via Docker container to eliminate the risk of leakage or risk during each phase.

Without diving too far into the technicalities, the IRE enables you to recover survivable data, restore software configurations, and prevent reinfection. Here are some things you can do (and should do) in this scenario, courtesy of the IRE:

Establish your war room

You can’t fight ransomware if you can’t securely communicate with your team. Use the IRE to create offline, break-the-glass accounts that are not attached to email. This allows you to communicate and set up ticketing for forensics purposes.

Isolate affected systems

There’s no use running antivirus if reinfection can occur. Use the IRE to take offline the switch that connects the backup and file servers. Isolate these servers from each other and shut down direct backup ports. Then, you can remote-in (KVM, iKVM, iDRAC) to run antivirus and EDR (Endpoint Detection and Response).

Restore data and device images

The key is to have backup data at its most current, both for patient data and device/software configurations. Because the IRE provides an isolated environment, and you’ve already pulled your backups offline, you can gradually restore data, re-image devices, and restore configurations without risking reinfection. The IRE ensures devices “keep away” from each other until they can be cleansed and recovered.

Things You’ll Need To Build The IMI and IRE

Network Automation Blueprint

We’ve created a comprehensive blueprint that shows how to implement the architecture for IMI and IRE. Don’t let the name fool you. The Network Automation Blueprint covers everything from establishing a dedicated management network, to automating deployment of services for ransomware recovery. Get your PDF copy now at the link below.

Download blueprint

Gen 3 Console Servers To Replace End-of-Life Gear

It’s nearly impossible to build the IMI or deploy the IRE using older console servers. That’s because these only give you basic remote access and a hint of automation capabilities. You’ll still need the ability to run VMs and containers. Gen 3 console servers let you do all of the things for IMI and IRE, like full control plane/data plane separation, hosting apps, and deploying VMs/containers on-demand. They’ve also been validated by Synopsys and have built-in security features I’ve been talking about for years. Check out the link below for resources about Gen 3 and how we’ll help you upgrade.

Upgrade to Gen 3

Get in touch with me!

I’d love to talk with you about IMI, IRE, and resilience systems. These are becoming more crucial to operational resilience and ransomware recovery, and countries are passing new regulations that will require these approaches. Get in touch with me via social media to talk about this!

The post What to do if You’re Ransomware’d: A Healthcare Example appeared first on ZPE Systems.

Best Network Performance Monitoring Tools

ZPE Systems — Wed, 15 Nov 2023 07:00:00 +0000

Network performance monitoring tools provide visibility into the health and efficiency of networks and their underlying infrastructure of devices and software. Some platforms focus entirely on collecting and analyzing logs from various sources on the network, while others provide additional management capabilities that let you control, change, and troubleshoot network infrastructure. Choosing the right solution requires a thoughtful consideration of factors such as the cost, scalability, and interoperability of the software, as well as your team’s experience and abilities. This guide compares three of the best network performance monitoring tools by analyzing these critical factors before providing advice on the most scalable and cost-effective way to deploy your solutions.

Comparing best network performance monitoring tools

Platform	Key Features
SolarWinds Network Performance Monitor (NPM)	Network device, performance, and fault monitoring Deep packet inspection and analysis LAN and WAN monitoring Automatic network discovery, mapping, and monitoring Network availability monitoring Network diagnostics Network path analysis Network performance testing SNMP monitoring Wi-Fi analysis
Kentik	Network telemetry dashboards Multi-vendor network monitoring Cloud, edge, and hybrid cloud monitoring SaaS application performance & uptime monitoring Intelligent automated alerts SNMP, traffic flow, VPC, host agent, and synthetic monitoring Multi-cloud performance monitoring Kubernetes workload monitoring SD-WAN monitoring Network security monitoring Network map visualizations QoE monitoring
ThousandEyes	Network availability and performance testing WAN performance monitoring Cisco SD-WAN monitoring and optimization Browser session monitoring Network path visibility User Wi-Fi connectivity monitoring VPN mapping and monitoring Cross-layer data visualizations

Disclaimer: This comparison was written by a 3rd party in collaboration with ZPE Systems using data gathered from publicly available data sheets and admin guides, as of 10/20/2023. Please email us if you have corrections or edits, or want to review additional attributes: Matrix@zpesystems.com

SolarWinds Network Performance Monitor (NPM)

The Network Performance Monitor (NPM) is part of the SolarWinds Orion platform of integrated products. This mature and richly featured monitoring software is delivered as a cloud-based service and can observe SaaS (software as a service), cloud, hybrid cloud, and on-premises infrastructure. With advanced features like deep packet inspection (DPI), WAN optimization monitoring, automatic network mapping, and automated diagnostic tools, SolarWinds NPM is meant to be a complete, enterprise-grade observability solution. As part of the Orion platform, it’s also extensible with other products from the SolarWinds ecosystem, such as a Network Configuration Manager. As an enterprise solution, SolarWinds NPM comes with a high price tag that grows even larger as additional monitoring agents are added, limiting the scalability. Another important factor to consider is that SolarWinds recently suffered a high-profile hack that compromised thousands of customers, so there are security risks involved in trusting the Orion supply chain. Additionally, despite a large library of integrations, SolarWinds is a closed ecosystem that doesn’t work well with 3rd-party tools or custom scripts.

Pros	Cons
Supports SaaS, cloud, and on-premises networks Includes advanced monitoring features like DPI Part of a large ecosystem of observability and management solutions	Pricing is expensive and limits scalability Recently suffered a high-profile breach that impacted thousands of customers Closed ecosystem may not support your 3rd-party tools

Kentik

Kentik is an end-to-end network observability platform for cloud, multi-cloud, hybrid cloud, SaaS, and data center infrastructure. In addition to network performance monitoring, the platform includes monitoring solutions for SaaS application performance and SD-WAN performance. Other observability features include SaaS uptime monitoring, AI-driven insights and alerts, network security monitoring, and QoE (Quality of Experience) monitoring. Kentik also recently launched a Kubernetes network monitoring solution called Kentik Kube that provides end-to-end cluster visibility. Overall, Kentik is a powerful network observability platform that includes many of its most innovative features in its “Essentials” and “Pro” pricing packages, providing a lot of bang for your buck. The downside is that you can’t subscribe to features individually and must purchase a whole package, meaning you could end up paying for features you don’t need. Because Kentik is not a large vendor, its customer service may be slow to respond in some cases. Additionally, although Kentik does have a large library of integrations, it is not a vendor-neutral platform.

Pros	Cons
Supports cloud, multi-cloud, hybrid cloud, SaaS, and data center infrastructure Includes many advanced features and solutions at no additional cost Provides AI-driven network insights and intelligent alerts	Products aren’t available a la carte Customer service and technical support can be slow to respond Isn’t entirely vendor-neutral

ThousandEyes

ThousandEyes is a digital experience monitoring platform primarily focused on network and application synthetic testing, end-user performance monitoring, and ISP Internet monitoring for SaaS, cloud, and on-premises networks. Additionally, ThousandEyes is part of the Cisco family and can be used to monitor and optimize Cisco SD-WAN architectures. Across its family of observability products, ThousandEyes includes features like wireless network visibility, SaaS performance visualizations, cloud application outage detection, and SD-WAN performance forecasting. The major advantage of the ThousandEyes platform is that it provides true end-to-end visibility of the entire service delivery chain, including end-user device performance and third-party provider availability. One downside is the endpoint agent-based monitoring solution requires on-premises VMs to run, which can be cumbersome to maintain and limits scalability. The pricing is expensive compared to similar solutions, and you may have to combine products to get all the features you need. Additionally, ThousandEyes is not a vendor-neutral platform and has a relatively small library of integrations.

Pros	Cons
Supports SaaS, cloud, and on-premises networks Works with Cisco DNA software for SD-WAN monitoring Provides end-to-end visibility of the entire service delivery chain	Agent-based monitoring requires on-premises VMs, limiting scalability Pricing is expensive compared to similar solutions Limited integrations, preventing interoperability

Conclusion

Each of the solutions on this list has advantages that make it well-suited to certain environments, as well as limitations to consider. Solarwinds NPM is part of a large ecosystem of observability and management solutions that includes advanced features like DPI, but it’s suffering from a major security incident and has a closed ecosystem. Kentik packs a lot of innovative, AI-driven monitoring capabilities into its platform offerings, but its pricing tiers are inflexible, and it doesn’t have the large, enterprise-grade support team of its larger competitors. ThousandEyes provides end-to-end visibility of the entire service delivery chain and works seamlessly with Cisco DNA software, but it has a steep learning curve and a limited library of integrations.

How to run the best network performance monitoring tools

Most network performance monitoring tools – even cloud-based SaaS offerings – communicate with endpoint agents using software deployed on VMs (virtual machines) running on-premises in each business location. Running these VMs on fully provisioned servers or PCs is expensive, but deploying them on NUCs is highly insecure, especially as organizations scale out with distributed branches and edge computing sites. What’s needed is a consolidated hardware solution that combines critical branch, edge, and data center networking functionality with vendor-neutral VM and application hosting, such as the Nodegrid platform from ZPE Systems. Nodegrid’s serial switches and network edge routers run the open, Linux-based Nodegrid OS, which can host your choice of third-party software – including Docker containers – for network performance monitoring, SD-WAN, security, automation, and more. Nodegrid’s versatile, modular hardware solutions also provide out-of-band (OOB) management access to critical remote infrastructure and monitoring solutions, giving teams a lifeline to recover from outages and ransomware attacks. Nodegrid uses innovative, enterprise-grade security features like Secure Boot, self-encrypted disk, and two-factor authentication (2FA), and its onboard software is frequently patched for vulnerabilities to defend against a breach. Deploying Nodegrid at each business site consolidates your network to reduce hardware overhead, streamlining management and enabling easy scalability.

Deploy the best network performance monitoring tools with Nodegrid

Reach out to ZPE Systems to see a demo of how the best network performance monitoring tools run on the Nodegrid platform.
Contact Us

The post Best Network Performance Monitoring Tools appeared first on ZPE Systems.