Data Center Resilience Archives - ZPE Systems

3 Reasons to Use Starlink for Out-of-Band (and How to Set it Up)

Jordan Baker — Fri, 06 Sep 2024 21:14:53 +0000

Most organizations rely on critical IT in order to serve their essential business functions. A reliable method to maintain critical IT is to use dedicated out-of-band (OOB) management networks, which traditionally have relied on plain old telephone service (POTS) lines or dedicated telephony circuits for remote access. However, these traditional links come with high costs, lots of complexity, and slow performance, which make them difficult to deploy and maintain.

Enter Starlink, a satellite-based Internet service that offers a cost-effective and scalable alternative for out-of-band remote access. This post discusses how Starlink solves these common problems and gives you a free guide that walks you through the setup process.

Problem: POTS and Telephony Lines Are Expensive

For decades, IT professionals have relied on POTS and telephony lines for OOB management, mainly because these lines remain operational even when the primary data network goes down. A major problem is that POTS lines are increasingly expensive to install and maintain, particularly in remote or rural areas. Additionally, 4G/5G LTE options aren’t always available due to coverage limitations or large enough data plans. The shift towards VoIP (Voice over IP) and digital communications has made POTS lines even less relevant, with many service providers phasing out support. This leaves businesses with fewer options and higher costs for maintaining these legacy systems.

Solution: Starlink is Cost-Effective

Starlink offers a much more cost-effective solution. You can use off-the-shelf routers to set up an OOB management network for a fraction of the cost of traditional methods. Starlink also has a relatively low monthly subscription fee and straightforward pricing model, which make it easy to budget and plan IT expenditures. If components fail or break, you can typically repair or replace them yourself to get back up and running quickly.

Figure 1: Starlink requires only a dish, router, and few other components, making it a cost-effective alternative to expensive POTS lines.

Problem: Traditional Lines Are Difficult To Scale

Traditional POTS-based systems are notoriously difficult to scale, often requiring significant infrastructure investments and complex configurations. Copper wiring is expensive to install and maintain, and as more connections come online, switching systems must be upgraded. On top of this, POTS lines are being phased out, which means there are fewer resources being devoted to scaling and maintaining them.

Solution: Starlink is Simple to Set Up and Scale

Starlink entirely eliminates the need for telephony lines, and is a simple and scalable solution for OOB remote access. You can find the full list of components in our setup guide below, but with a Starlink terminal, compatible router, and minimal configuration, you can scale your OOB network wherever you have Starlink coverage. This ease-of-use extends to day-to-day management as well. Starlink’s satellite service offers global coverage, meaning you can manage your network devices, servers, and other critical infrastructure from virtually anywhere in the world.

Figure 2: Starlink comes with a straightforward out-of-box experience and step-by-step instructions. You can set up an out-of-band network in about one hour.

Problem: POTS Lines Lack Performance

POTS is designed primarily for voice communication and offers extremely limited bandwidth. It can’t support modern data services (such as video or high-speed internet) efficiently. As out-of-band management advances with data and video monitoring capabilities (such as AI computer vision), POTS infrastructure just doesn’t have the bandwidth to keep up.

Solution: Starlink Meets Modern Performance Requirements

Starlink provides high-speed internet, at speeds that typically range from 50 to 200Mbps. The connection handles much larger volumes of data than POTS lines are capable of, and Starlink’s low-Earth orbit satellites reduce latency to as low at 25ms compared to the typical 150ms of POTS lines. Out-of-band using Starlink means that IT teams can manage more systems and data, and have a more responsive experience, whether they’re managing edge routers across their bank branches or monitoring the cooling systems in their distributed colocations.

Figure 3: Starlink provides high-speed connectivity, with speeds ranging from 50 to 200Mbps.

Get Started With Starlink Using Our Setup Guide

We created this step-by-step walkthrough that shows how to set up Starlink for out-of-band. It instructs how to connect the components according to a wiring diagram, configure your ZPE Nodegrid hardware, and test your connection performance using free tools. Read it now using the button below.

Get Starlink Setup Guide

The post 3 Reasons to Use Starlink for Out-of-Band (and How to Set it Up) appeared first on ZPE Systems.

What is Passive Optical Networking?

Jordan Baker — Fri, 06 Sep 2024 20:02:49 +0000

What is Passive Optical Networking (PON)?

Passive optical networking (PON) is a high-speed broadband technology that enables the delivery of multiple services over a single fiber optic cable. XGS-PON – 10G Symmetrical PON – offers speeds of up to 10 Gbps downstream and 10 Gbps upstream (hence the term ‘symmetrical’), making it ideal for applications such as video streaming, online gaming, and cloud computing.

What Problems Does PON Solve for Out-of-Band Management?

PON addresses the issue of efficiency in terms of both uplink costs and bandwidth usage. Traditional POTS lines and dedicated circuits rely on legacy infrastructure that requires regular maintenance. This infrastructure must scale as more out-of-band devices are added to the network, which increases costs and energy consumption. On top of this, using a 10G uplink for a serial console’s 10K traffic is like throwing away 99% of that high bandwidth. Per Gartner’s Market Guide for Optical Transport Systems report (Published 20 November 2023) the best way to “lower cost and energy per transported bit” is by using technologies such as passive optical networking.

Because PON uses passive optical splitters that have no moving parts or powered components between the central hub and end users, PON is much more efficient for deploying serial consoles close to target assets. These out-of-band devices can be deployed in large quantities and close to the network edge, with up to 256 devices sharing one uplink. This reduces cabling and power requirements, and is ideal for MSP and campus operators, where there are many out-of-band devices distributed over long distances.

More About PON: GPON and XGS-PON Technologies

Passive Optical Networking (PON) leverages time-division multiplexing (TDM) and different wavelengths of light to transmit and receive data on a single fiber strand. This allows efficient communication among up to 256 devices over a single fiber. Initially developed for fiber-to-the-home (FTTH) deployments, PON technology has evolved to facilitate the addition of network nodes with minimal infrastructure changes. GPON (gigabit-capable PON) and XGS-PON use different frequencies for upstream and downstream data transmission. The upstream headend, known as the Optical Line Terminal (OLT), manages and coordinates the time slots allocated to downstream Optical Network Units (ONUs) for data transmission.

GPON and XGS-PON Support on ZPE Systems’ Nodegrid SR Gateway

ZPE Systems’ Nodegrid SR appliances, which are used as out-of-band access nodes or complete branch gateways, now support GPON and XGS-PON technology (patent pending) via SFP and SFP+ ports. The Nodegrid SR family is offered in multiple form factors to be right-sized for deployments in branch offices, factories, smart buildings, and industrial environments (such as for SCADA).

Having support for GPON and XGS-PON means network engineers now have a flexible choice of high-speed uplink technologies. This versatility makes the Nodegrid SR gateway suitable for edge deployments, where it can establish an OOBI-WAN (out-of-band infrastructure WAN) link, and for data centers, where it enhances uplink efficiency. Given the low bandwidth requirements of serial console and out-of-band communications, PON technology is well-suited for these applications. A single fiber strand can be shared among hundreds of out-of-band and serial console devices using passive optical splitters. Organizations can deploy out-of-band devices close to the racks and edges of the network in a cost- and energy-efficient manner. Additionally, ZPE devices support ONU SFPs compatible with third-party OLT headends, ensuring broad interoperability and integration.

Benefits of Using XGS-PON with ZPE Systems’ Nodegrid SR Gateway

The benefits of using XGS-PON with ZPE Systems’ Nodegrid SR gateway include:

High-Speed Connectivity: XGS-PON delivers symmetrical speeds of up to 10 Gbps, making it ideal for high-bandwidth applications like video streaming, online gaming, and cloud computing. This ensures consistent and high-quality service for end-users.
Cost-Effectiveness: Deploying XGS-PON is a cost-effective solution for delivering high-speed broadband services, especially in scenarios where upgrading existing infrastructure may be challenging.
Scalability: The Nodegrid SR Gateway, acting as an ONU, can connect up to 256 serial consoles through a single fiber strand. PON’s use of asymmetric wavelengths and TDM enables multiple devices to share the same fiber strand efficiently. Optical splitters, which require no external power, facilitate the sharing of fiber between multiple ONUs, which makes scaling much more cost and energy efficient.
Reliability: The Nodegrid SR gateway is proven by service providers worldwide. Its robust design and compatibility with various network configurations make it a reliable choice for delivering high-quality broadband services.

Figure 1: ZPE Nodegrid SR gateway with XGS-PON ONU support

XGS-PON Enhances Efficiency of Out-of-Band

XGS-PON is a significant advancement over traditional, copper-based uplinks. The integration of XGS-PON support in the ZPE Systems Nodegrid SR Gateway allows network architects to deploy a dedicated out-of-band ring that is not only high-speed but also cost-effective, energy-efficient, and capable of covering longer distances. PON technology, with its ability to handle the lower data rates of out-of-band transmissions, is an ideal uplink medium for serial console transmission. The combination of XGS-PON and the Nodegrid SR Gateway provides a powerful and flexible solution for modern network infrastructure.

Be one of the first to try PON on the Nodegrid SR Gateway

Set up a demo for a deeper dive into PON use cases and how it can benefit your organization.

Schedule a demo

The post What is Passive Optical Networking? appeared first on ZPE Systems.

Comparing Console Server Hardware

ZPE Systems — Wed, 04 Sep 2024 17:03:31 +0000

Console servers – also known as serial consoles, console server switches, serial console servers, serial console routers, or terminal servers – are critical for data center infrastructure management. They give administrators a single point of control for devices like servers, switches, and power distribution units (PDUs) so they don’t need to log in to each piece of equipment individually. It also uses multiple network interfaces to provide out-of-band (OOB) management, which creates an isolated network dedicated to infrastructure orchestration and troubleshooting. This OOB network remains accessible during production network outages, offering remote teams a lifeline to recover systems without costly and time-consuming on-site visits.

Console server hardware can vary significantly across different vendors and use cases. This guide compares console server hardware from the three top vendors and examines four key categories: large data centers, mixed environments, break-fix deployments, and modular solutions.

Console server hardware for large data center deployments

Large and hyperscale data centers can include hundreds or even thousands of individual devices to manage. Teams typically use infrastructure automation, like infrastructure as code (IaC), because managing devices at such a large scale is impossible to do manually. The best console server hardware for high-density data centers will include plenty of managed serial ports, support hundreds of concurrent sessions, and provide support for infrastructure automation.

Click here to compare the hardware specs of the top providers, or read below for more information.

Nodegrid Serial Console Plus (NSCP)

The Nodegrid Serial Console Plus (NSCP) from ZPE Systems is the only console server providing up to 96 RS-232 serial ports in a 1U rack-mounted form factor. Its quad-core Intel processor and robust (as well as upgradable) internal storage and RAM options, as well as its Linux-based Nodegrid OS, support Guest OS and Docker containers for third-party applications. That means the NSCP can directly host infrastructure automation (like Ansible, Puppet, and Chef), security (like Palo Alto’s next-generation firewalls and Secure Access Service Edge), and much more. Plus, it can extend zero-touch provisioning (ZTP) to legacy and mixed-vendor devices that otherwise wouldn’t support automation.

The NSCP also comes packed with hardware security features including BIOS protection, UEFI Secure Boot, self-encrypted disk (SED), Trusted Platform Module (TPM) 2.0, and a multi-site VPN using IPSec, WireGuard, and OpenSSL protocols. Plus, it supports a wide range of USB environmental monitoring sensors to help remote teams control conditions in the data center or colocation facility.

Advantages:

Up to 96 managed serial ports in a 1U appliance
Intel x86 CPU and 4GB of RAM for 3rd-party Docker and VM apps
Extends ZTP and automation to legacy and mixed-vendor infrastructure
Robust on-board security features like BIOS protection and TPM 2.0
Supports a wide range of USB environmental monitoring sensors
Wi-Fi and 5G/4G LTE options available
Supports over 1,000 concurrent sessions

Disadvantages:

USB ports limited on 96-port model

Opengear CM8100

The Opengear CM8100 comes in two models: the 1G version includes up to 48 managed serial ports, while the 10G version supports up to 96 serial ports in a 2U form factor. Both models have a dual-core ARM Cortex processor and 2GB of RAM, allowing for some automation support with upgraded versions of the Lighthouse management software. They also come with an embedded firewall, IPSec and OpenVPN protocols for a single-site VPN, and TPM 2.0 security.

Advantages:

10G model comes with software-selectable serial ports
Supports OpenVPN and IPSec VPNs
Fast port speeds

Disadvantages:

Automation and ZTP require Lighthouse software upgrade
No cellular or Wi-Fi options
96-port model requires 2U of rack space

Perle IOLAN SCG (fixed)

The IOLAN SCG is Perle’s fixed-form-factor console server solution. It supports up to 48 managed serial ports and can extend ZTP to end devices. It comes with onboard security features including an embedded firewall, OpenVPN and IPSec VPN, and AES encryption. However, the IOLAN SCG’s underpowered single-core ARM processor, 1GB of RAM, and 4GB of storage limit its automation capabilities, and it does not integrate with any third-party automation or orchestration solutions.

Advantages:

Supports ZTP for end devices
Comprehensive firewall functionality

Disadvantages

Very limited CPU, RAM, and flash storage
Does not support third-party automation

Comparison Table: Console Server Hardware for Large Data Centers

	Nodegrid NSCP	Opengear CM8100	Perle IOLAN SCG
Serial Ports	16 / 32 / 48 / 96x RS-232	16 / 32 / 48 / 96x RS-232	16 / 32 / 48x RS-232
Max Port Speed	230,400 bps	230,400 bps	230,000 bps
Network Interfaces	2x SFP+ 2x ETH 1x Wi-Fi (optional) 2x Dual SIM LTE (optional)	2x ETH	1x ETH
Additional Interfaces	1x RS-232 console 2x USB 3.0 Type A 1x HDMI Output	1x RS-232 console 2x USB 3.0	1x RS-232 console 1x Micro USB w/DB9 Adapter
Environmental Monitoring	Any USB sensors	–	–
CPU	Intel x86_64 Quad-Core	ARM Cortex-A9 1.6 GHz Dual-Core	ARM 32-bit 500MHz Single-Core
Storage	32GB SSD (upgrades available)	32GB eMMC	4GB Flash
RAM	4GB DDR4 (upgrades available)	2GB DDR4	1GB
Power	Single or Dual AC Dual DC	Dual AC Dual DC	Single AC
Form Factor	1U Rack Mounted	1U Rack Mounted (up to 48 ports) 2U Rack Mounted (96 ports)	1U Rack Mounted
Data Sheet	Download	CM8100 1G CM8100 10G	Download

Console server hardware for mixed environments

Data center deployments that include a mix of legacy and modern solutions from multiple vendors benefit from console server hardware that includes software-selectable serial ports. This feature allows administrators to manage devices with straight or rolled RS-232 pinouts from the same console server.

Click here to compare the hardware specs of the top providers, or read below for more information.

Nodegrid Serial Console S Series

The Nodegrid Serial Console S Series has up to 48 auto-sensing RS-232 serial ports and 14 high-speed managed USB ports, allowing for the control of up to 62 devices. Like the NSCP, the S Series has a quad-core Intel CPU and upgradeable storage and RAM, supporting third-party VMs and containers for automation, orchestration, security, and more. It also comes with the same robust security features to protect the management network.

Advantages:

Includes 14 high-speed managed USB ports
Intel x86 CPU and 4GBof RAM for 3rd-party Docker and VM apps
Supports a wide range of USB environmental monitoring sensors
Extends ZTP and automation to legacy and mixed-vendor infrastructure
Robust on-board security features like BIOS protection and TPM 2.0
Supports 250+ concurrent sessions

Disadvantages

Only offers 1Gbps and Ethernet connectivity for OOB

Opengear OM2200

The Opengear OM2200 comes with 16, 32, or 48 software-selectable RS-232 ports, or, with the OM2224-24E model, 24 RS-232 and 24 managed Ethernet ports. It also includes 8 managed USB ports and the option for a V.92 analog modem. It has impressive storage space and 8GB of DDR4 RAM for automated workflows, though, as with all Opengear solutions, the upgraded version of the Lighthouse management software is required for ZTP and NetOps automation support.

Advantages:

Optional managed Ethernet ports
Optional V.92 analog modem for OOB
64GB of storage and 8GB DDR4 RAM

Disadvantages:

Automation and ZTP require Lighthouse software upgrade
No cellular or Wi-Fi options

Comparison Table: Console Server Hardware for Mixed Environments

	Nodegrid S Series	Opengear OM2200
Serial Ports	16 / 32 / 48x Software Selectable RS-232 14x USB-A serial	16 / 32 / 48x Software Selectable RS-232 8x USB 2.0 serial (OM2224-24E) 24x Software Selectable RS-232 and 24x Managed Ethernet
Max Port Speed	230,400 bps (RS-232) 921,600 bps (USB)	230,400 bps
Network Interfaces	2x1Gbps or 2x ETH	2x SFP+ or 2x ETH 1x V.92 modem (select models)
Additional Interfaces	1x RS-232 console 1x USB 3.0 Type A 1x HDMI Output	1x RS-232 console 1x Micro USB 2x USB 3.0
Environmental Monitoring	Any USB sensors	–
CPU	Intel x86_64 Dual-Core	AMD GX-412TC 1.4 GHz Quad-Core
Storage	32GB SSD (upgrades available)	64GB SSD
RAM	4GB DDR4 (upgrades available)	8GB DDR3
Power	Single or Dual AC Dual DC	Dual AC Dual DC
Form Factor	1U Rack Mounted	1U Rack Mounted
Data Sheet	Download	Download

Console server hardware for break-fix deployments

A full-featured console server solution may be too complicated and expensive for certain use cases, especially for organizations just looking for “break-fix” OOB access to remotely troubleshoot and recover from issues. The best console server hardware for this type of deployment provides fast and reliable network access to managed devices without extra features that increase the price and complexity.

Click here to compare the hardware specs of the top providers, or read below for more information.

Nodegrid Serial Console Core Edition (NSCP-CE)

The Nodegrid Serial Console Core Edition (NSCP-CE) provides the same hardware and security features as the NSCP, as well as ZTP, but without the advanced automation capabilities. Its streamlined management and affordable price tag make it ideal for lean, budget-conscious IT departments. And, like all Nodegrid solutions, it comes with the most comprehensive hardware security features in the industry.

Advantages:

Up to 48 managed serial ports in a 1U appliance
Extends ZTP and automation to legacy and mixed-vendor infrastructure
Robust on-board security features like BIOS protection and TPM
Supports a wide range of USB environmental monitoring sensors
Analog modem and 5G/4G LTE options available
Supports over 100 concurrent sessions

Disadvantages

Supports automation only via ZPE Cloud

Opengear CM7100

The Opengear CM7100 is the previous generation of the CM8100 solution. Its serial and network interface options are the same, but it comes with a weaker, Armada 800 MHz CPU, and there are options for smaller storage and RAM configurations to reduce the price. As with all Opengear console servers, the CM7100 doesn’t support ZTP without paying for an upgraded Lighthouse license, however.

Advantages:

Can reduce storage and RAM to save money
Supports OpenVPN and IPSec VPNs
Fast port speeds

Disadvantages:

Automation and ZTP require Lighthouse software upgrade
No cellular or Wi-Fi options
96-port model requires 2U of rack space

Comparison Table: Console Server Hardware for Break-Fix Deployments

	Nodegrid NSCP-CE	Opengear CM7100
Serial Ports	16 / 32 / 48 / RS-232	16 / 32 / 48 / 96x RS-232
Max Port Speed	230,400 bps	230,400 bps
Network Interfaces	2x SFP ETH 1x Analog modem (optional) 2x 5G/4G LTE (optional)	2x ETH
Additional Interfaces	1x RS-232 console 2x USB 3.0 Type A	1x RS-232 console 2x USB 2.0
Environmental Monitoring	Any USB sensors	Smoke, water leak, vibration
CPU	Intel x86_64 Dual-Core	Armada 370 ARMv7 800 MHz
Storage	16GB Flash (upgrades available)	4-64GB storage
RAM	4GB DDR4 (upgrades available)	256MB-2GB DDR3
Power	Dual AC Dual DC	Single or Dual AC
Form Factor	1U Rack Mounted	1U Rack Mounted (up to 48 ports) 2U Rack Mounted (96 ports)
Data Sheet	Download	Download

Modular console server hardware for flexible deployments

Modular console servers allow organizations to create customized solutions tailored to their specific deployment and use case. They also support easy scaling by allowing teams to add more managed ports as the network grows, and provide the flexibility to swap-out certain capabilities and customize their hardware and software as the needs of the business change.

Click here to compare the hardware specs of the top providers, or read below for more information.

Nodegrid Net Services Router (NSR)

The Nodegrid Net Services Router (NSR) has up to five expansion bays that can support any combination of 16 RS-232 or 16 USB serial modules. In addition to managed ports, there are NSR modules for Ethernet (with or without PoE – Power over Ethernet) switch ports, Wi-Fi and dual-SIM cellular, additional SFP ports, extra storage, and compute.

The NSR comes with an eight-core Intel CPU and 8GB DDR4 RAM, offering the same vendor-neutral Guest OS/Docker support and onboard security features as the NSCP. It can also run virtualized network functions to consolidate an entire networking stack in a single device. This makes the NSR adaptable to nearly any deployment scenario, including hyperscale data centers, edge computing sites, and branch offices.

Advantages:

Up to 5 expansion bays provide support for up to 80 managed devices
8GB of DDR4 RAM
Robust on-board security features like BIOS protection and TPM 2.0
Supports a wide range of USB environmental monitoring sensors
Wi-Fi and 5G/4G LTE options available
Optional modules for various interfaces, extra storage, and compute

Disadvantages

No V.92 modem support

Perle IOLAN SCG L/W/M

The Perle IOLAN SCG modular series is customizable with cellular LTE, Wi-Fi, a V.92 analog modem, or any combination of the three. It also has three expansion bays that support any combination of 16-port RS-232 or 16-port USB modules. Otherwise, this version of the IOLAN SCG comes with the same security features and hardware limitations as the fixed form factor models.

Advantages:

Cellular, Wi-Fi, and analog modem options
Supports ZTP for end devices
Comprehensive firewall functionality

Disadvantages

Very limited CPU, RAM, and flash storage
Does not support third-party automation

Comparison Table: Modular Console Server Hardware

	Nodegrid NSR	Perle IOLAN SCG R/U
Serial Ports	16 / 32 / 48 / 64 / 80x RS-232 with up to 5 serial modules 16 / 32 / 48 / 64 / 80x USB with up to 5 serial modules	Up to 50x RS-232/422/485 Up to 50x USB
Max Port Speed	230,400 bps	230,000 bps
Network Interfaces	1x SFP+ 1x ETH with PoE in 1x Wi-Fi (optional) 1x Dual SIM LTE (optional)	2x SFP or 2x ETH
Additional Interfaces	1x RS-232 console 2x USB 2.0 Type A 2x GPIO 2x Digital Out 1x VGA Optional Modules (up to 5): 16x ETH 8x PoE+ 16x SFP 8x SFP+ 16x USB OCP Debug	1x RS-232 console 1x Micro USB w/DB9 adapter
Environmental Monitoring	Any USB sensors	–
CPU	Intel x86_64 Quad- or Eight-Core	ARM 32-bit 500MHz Single-Core
Storage	32GB SSD (upgrades available)	4GB Flash
RAM	8GB DDR4 (upgrades available	1GB
Power	Dual AC Dual DC	Dual AC Dual DC
Form Factor	1U Rack Mounted	1U Rack Mounted
Data Sheet	Download	Download

Get the best console server hardware for your deployment with Nodegrid

The vendor-neutral Nodegrid platform provides solutions for any use case, deployment size, and pain points. Schedule a free Nodegrid demo to learn more.

Want to see Nodegrid in action?

Watch a demo of the Nodegrid Gen 3 out-of-band management solution to see how it can improve scalability for your data center architecture.

Watch a demo

The post Comparing Console Server Hardware appeared first on ZPE Systems.

Data Center Scalability Tips & Best Practices

ZPE Systems — Thu, 22 Aug 2024 17:25:32 +0000

Data center scalability is the ability to increase or decrease workloads cost-effectively and without disrupting business operations. Scalable data centers make organizations agile, enabling them to support business growth, meet changing customer needs, and weather downturns without compromising quality. This blog describes various methods for achieving data center scalability before providing tips and best practices to make scalability easier and more cost-effective to implement.

How to achieve data center scalability

There are four primary ways to scale data center infrastructure, each of which has advantages and disadvantages.

4 Data center scaling methods

Method	Description	Pros and Cons
1. Adding more servers	Also known as scaling out or horizontal scaling, this involves adding more physical or virtual machines to the data center architecture.	Can support and distribute more workloads Eliminates hardware constraints Deployment and replication take time Requires more rack space Higher upfront and operational costs
2. Virtualization	Dividing physical hardware into multiple virtual machines (VMs) or virtual network functions (VNFs) to support more workloads per device.	Supports faster provisioning Uses resources more efficiently Reduces scaling costs Transition can be expensive and disruptive Not supported by all hardware and software
3. Upgrading existing hardware	Also known as scaling up or vertical scaling, this involves adding more processors, memory, or storage to upgrade the capabilities of existing systems.	Implementation is usually quick and non-disruptive More cost-effective than horizontal scaling Requires less power and rack space Scalability limited by server hardware constraints Increases reliance on legacy systems
4. Using cloud services	Moving some or all workloads to the cloud, where resources can be added or removed on-demand to meet scaling requirements.	Allows on-demand or automatic scaling Better support for new and emerging technologies Reduces data center costs Migration is often extremely disruptive Auto-scaling can lead to ballooning monthly bills May not support legacy software

It’s important for companies to analyze their requirements and carefully consider the advantages and disadvantages of each method before choosing a path forward.

Best practices for data center scalability

The following tips can help organizations ensure their data center infrastructure is flexible enough to support scaling by any of the above methods.

Run workloads on vendor-neutral platforms

Vendor lock-in, or a lack of interoperability with third-party solutions, can severely limit data center scalability. Using vendor-neutral platforms ensures that teams can add, expand, or integrate data center resources and capabilities regardless of provider. These platforms make it easier to adopt new technologies like artificial intelligence (AI) and machine learning (ML) while ensuring compatibility with legacy systems.

Use infrastructure automation and AIOps

Infrastructure automation technologies help teams provision and deploy data center resources quickly so companies can scale up or out with greater efficiency. They also ensure administrators can effectively manage and secure data center infrastructure as it grows in size and complexity.

For example, zero-touch provisioning (ZTP) automatically configures new devices as soon as they connect to the network, allowing remote teams to deploy new data center resources without on-site visits. Automated configuration management solutions like Ansible and Chef ensure that virtualized system configurations stay consistent and up-to-date while preventing unauthorized changes. AIOps (artificial intelligence for IT operations) uses machine learning algorithms to detect threats and other problems, remediate simple issues, and provide root-cause analysis (RCA) and other post-incident forensics with greater accuracy than traditional automation.

Isolate the control plane with Gen 3 serial consoles

Serial consoles are devices that allow administrators to remotely manage data center infrastructure without needing to log in to each piece of equipment individually. They use out-of-band (OOB) management to separate the data plane (where production workflows occur) from the control plane (where management workflows occur). OOB serial console technology – especially the third-generation (or Gen 3) – aids data center scalability in several ways:

Gen 3 serial consoles are vendor-neutral and provide a single software platform for administrators to manage all data center devices, significantly reducing management complexity as infrastructure scales out.
Gen 3 OOB can extend automation capabilities like ZTP to mixed-vendor and legacy devices that wouldn’t otherwise support them.
OOB management moves resource-intensive infrastructure automation workflows off the data plane, improving the performance of production applications and workflows.
Serial consoles move the management interfaces for data center infrastructure to an isolated control plane, which prevents malware and cybercriminals from accessing them if the production network is breached. Isolated management infrastructure (IMI) is a security best practice for data center architectures of any size.

How Nodegrid simplifies data center scalability

Nodegrid is a Gen 3 out-of-band management solution that streamlines vertical and horizontal data center scalability.

The Nodegrid Serial Console Plus (NSCP) offers 96 managed ports in a 1RU rack-mounted form factor, reducing the number of OOB devices needed to control large-scale data center infrastructure. Its open, x86 Linux-based OS can run VMs, VNFs, and Docker containers so teams can run virtualized workloads without deploying additional hardware. Nodegrid can also run automation, AIOps, and security on the same platform to further reduce hardware overhead.

Nodegrid OOB is also available in a modular form factor. The Net Services Router (NSR) allows teams to add or swap modules for additional compute, storage, memory, or serial ports as the data center scales up or down.

Want to see Nodegrid in action?

Watch a demo of the Nodegrid Gen 3 out-of-band management solution to see how it can improve scalability for your data center architecture.

Watch a demo

The post Data Center Scalability Tips & Best Practices appeared first on ZPE Systems.

Understanding Serial Console Interfaces

ZPE Systems — Thu, 22 Aug 2024 07:59:02 +0000

A serial console (also known as a console server or terminal server) is a device that allows admins to manage critical network infrastructure like servers, routers, switches, and power distribution units (PDUs) without needing to log in to each piece of equipment individually. It also provides out-of-band (OOB) management, which creates an isolated network dedicated to infrastructure orchestration and troubleshooting. Serial console interfaces help improve management efficiency, accelerate recovery from outages and cyberattacks, and isolate the control plane from malicious actors.

This blog defines serial console interfaces and describes their technological evolution before discussing the benefits of using a modern serial console solution.

What is a serial console interface?

The term serial console interface could mean different things depending on the context and who’s saying it.

1. Some people use this term to refer to the serial console’s management GUI (graphical user interface), which administrators use to view and control data center devices.

2. Others use this term to refer to the individual connections between a serial console and each managed data center device. In addition to traditional RS-232 serial interfaces, a serial console may support RJ45, KVM (keyboard, video, mouse), IPMI (intelligent platform management interface), and USB (universal serial bus) interfaces.

3. Another potential (but less common) use of the term is for the text-based console interface (also known as a CLI, or command-line interface) used to configure and manage data center devices without a GUI. The console interface could be accessed in several ways, such as through a serial console’s GUI, or via a Telnet or SSH (secure shell) client like PuTTY.

4. Finally, it’s quite common to use the term serial console interface to describe the entire serial console solution, from the hardware itself to its managed ports, GUI, and CLI. The serial console acts as an interface between the production network (a.k.a., the data plane) and the management network (a.k.a., the control plane).

For the purposes of this discussion, we will use this fourth definition of serial console interfaces.

The evolution of serial console interfaces

First-generation

The first generation of serial consoles provides the basics: unified management of multiple data center devices, and an OOB network connection (such as a dial-up modem or cellular SIM card) so management workflows don’t rely on the main production network. A Gen 1 serial console interface allows administrators to access the CLI for each connected device even if the production network goes down from an ISP outage, equipment failure, or cyberattack. However, these serial consoles lack many of the advanced features required for modern network infrastructures, such as hardware encryption, third-party integrations, and automation capabilities. They typically only support standard RS-232 serial interfaces using a specific pinout.

Second-generation

The second generation added built-in security features, advanced authentication methods, and the ability to manage multi-vendor devices. Some vendors also added support for Python scripts and other automation, as well as zero-touch provisioning (ZTP) for supported end devices. However, Gen 2 serial console interfaces have closed architectures that prevent full automation of multi-vendor infrastructure. Their management GUIs are also typically only available as an on-premises virtual machine (VM), so remote administrators must be on the enterprise network or connected via VPN to access them.

Third-generation

Third-generation serial consoles are completely vendor-neutral, so they can control – and extend automation to – every physical and virtual asset in your environment. They use high-speed OOB network interfaces such as 5G cellular, and offer cloud-based management software so teams can manage and troubleshoot remote infrastructure from anywhere in the world. Gen 3 serial console interfaces are built on an open, x86 Linux-based architecture that supports third-party integrations and can run other vendors’ software. They accommodate legacy pinouts to control a variety of devices, such as PDUs, IPMI devices, and environmental monitoring sensors, and also feature modules that allow you to customize or modify interface types.

Gen 3 serial consoles have enterprise-grade security features like an encrypted disk and TPM 2.0 security. They also support integrations with Zero Trust providers for multi-factor authentication (MFA) and single sign-on (SSO). The third generation enables end-to-end network infrastructure automation using third-party tools like Ansible, Chef, and Puppet, as well as customer-built tools in VMs, Docker, or Kubernetes. Gen 3 serial console interfaces are essentially infrastructure multi-tools capable of running and deploying any solution, at any time, from anywhere.

The benefits of a Gen 3 serial console interface

The latest generation of serial consoles provides three major advantages:

Improved management efficiency. A vendor-neutral serial console allows administrators to manage infrastructure workflows and automation for large, complex network architectures from a single pane of glass. Teams can also extend automation to every infrastructure device, even legacy solutions that wouldn’t support it otherwise.

Reduced network downtime. With fast, reliable Gen 3 OOB, infrastructure teams have a lifeline to troubleshoot and recover remote infrastructure when the WAN (wide area network) or LAN (local area network) goes down. They can remotely power-cycle frozen devices, view environmental monitoring logs, and automatically provision replacement equipment without the time or expense of on-site visits.

Isolated management infrastructure (IMI). Gen 3 OOB creates an isolated control plane for network infrastructure, which helps protect management interfaces from malicious actors who have breached the production network. It also helps establish an isolated recovery environment (IRE) where teams can rebuild and restore systems without risking re-infection or re-compromise.

Want to learn more about serial consoles?

Gen 3 serial console interfaces like the Nodegrid Serial Console (NSC) from ZPE Systems use vendor-neutral architectures and end-to-end automation capabilities to help companies improve operational efficiency and network resilience. To learn more about how a Gen 3 solution can help with your biggest infrastructure pain points, watch a Nodegrid demo.

Watch a demo

The post Understanding Serial Console Interfaces appeared first on ZPE Systems.

AI Data Center Infrastructure

ZPE Systems — Fri, 09 Aug 2024 14:00:01 +0000

Artificial intelligence is transforming business operations across nearly every industry, with the recent McKinsey global survey finding that 72% of organizations had adopted AI, and 65% regularly use generative AI (GenAI) tools specifically. GenAI and other artificial intelligence technologies are extremely resource-intensive, requiring more computational power, data storage, and energy than traditional workloads. AI data center infrastructure also requires high-speed, low-latency networking connections and unified, scalable management hardware to ensure maximum performance and availability. This post describes the key components of AI data center infrastructure before providing advice for overcoming common pitfalls to improve the efficiency of AI deployments.

AI data center infrastructure components

Computing

Generative AI and other artificial intelligence technologies require significant processing power. AI workloads typically run on graphics processing units (GPUs), which are made up of many smaller cores that perform simple, repetitive computing tasks in parallel. GPUs can be clustered together to process data for AI much faster than CPUs.

Storage

AI requires vast amounts of data for training and inference. On-premises AI data centers typically use object storage systems with solid-state disks (SSDs) composed of multiple sections of flash memory (a.k.a., flash storage). Storage solutions for AI workloads must be modular so additional capacity can be added as data needs grow, through either physical or logical (networking) connections between devices.

Networking

AI workloads are often distributed across multiple computing and storage nodes within the same data center. To prevent packet loss or delays from affecting the accuracy or performance of AI models, nodes must be connected with high-speed, low-latency networking. Additionally, high-throughput WAN connections are needed to accommodate all the data flowing in from end-users, business sites, cloud apps, IoT devices, and other sources across the enterprise.

Power

AI infrastructure uses significantly more power than traditional data center infrastructure, with a rack of three or four AI servers consuming as much energy as 30 to 40 standard servers. To prevent issues, these power demands must be accounted for in the layout design for new AI data center deployments and, if necessary, discussed with the colocation provider to ensure enough power is available.

Management

Data center infrastructure, especially at the scale required for AI, is typically managed with a jump box, terminal server, or serial console that allows admins to control multiple devices at once. The best practice is to use an out-of-band (OOB) management device that separates the control plane from the data plane using alternative network interfaces. An OOB console server provides several important functions:

It provides an alternative path to data center infrastructure that isn’t reliant on the production ISP, WAN, or LAN, ensuring remote administrators have continuous access to troubleshoot and recover systems faster, without an on-site visit.
It isolates management interfaces from the production network, preventing malware or compromised accounts from jumping over from an infected system and hijacking critical data center infrastructure.
It helps create an isolated recovery environment where teams can clean and rebuild systems during a ransomware attack or other breach without risking reinfection.

An OOB serial console helps minimize disruptions to AI infrastructure. For example, teams can use OOB to remotely control PDU outlets to power cycle a hung server. Or, if a networking device failure brings down the LAN, teams can use a 5G cellular OOB connection to troubleshoot and fix the problem. Out-of-band management reduces the need for costly, time-consuming site visits, which significantly improves the resilience of AI infrastructure.

AI data center challenges

Artificial intelligence workloads, and the data center infrastructure needed to support them, are highly complex. Many IT teams struggle to efficiently provision, maintain, and repair AI data center infrastructure at the scale and speed required, especially when workflows are fragmented across legacy and multi-vendor solutions that may not integrate. The best way to ensure data center teams can keep up with the demands of artificial intelligence is with a unified AI orchestration platform. Such a platform should include:

Automation for repetitive provisioning and troubleshooting tasks
Unification of all AI-related workflows with a single, vendor-neutral platform
Resilience with cellular failover and Gen 3 out-of-band management.

To learn more, read AI Orchestration: Solving Challenges to Improve AI Value

Improving operational efficiency with a vendor-neutral platform

Nodegrid is a Gen 3 out-of-band management solution that provides the perfect unification platform for AI data center orchestration. The vendor-neutral Nodegrid platform can integrate with or directly run third-party software, unifying all your networking, management, automation, security, and recovery workflows. A single, 1RU Nodegrid Serial Console Plus (NSCP) can manage up to 96 data center devices, and even extend automation to legacy and mixed-vendor solutions that wouldn’t otherwise support it. Nodegrid Serial Consoles enable the fast and cost-efficient infrastructure scaling required to support GenAI and other artificial intelligence technologies.

Make Nodegrid your AI data center orchestration platform

Request a demo to learn how Nodegrid can improve the efficiency and resilience of your AI data center infrastructure.
Contact Us

The post AI Data Center Infrastructure appeared first on ZPE Systems.

Why Securing IT Means Replacing End-of-Life Console Servers

Jordan Baker — Thu, 25 Jul 2024 18:56:28 +0000

The world as we know it is connected to IT, and IT relies on its underlying infrastructure. Organizations must prioritize maintaining this infrastructure; otherwise, any disruption or breach has a ripple effect that takes services offline for millions of users (take the recent CrowdStrike outage, for example). A big part of this maintenance is ensuring that all hardware components, including console servers, are up-to-date and secure. Most console servers reach end-of-life (EOL) and need to be replaced, but for many reasons, whether budgetary concerns or the “if it isn’t broken” mentality, IT teams often keep their EOL devices. Let’s look at the risks of using EOL console servers, and why replacing them goes hand-in-hand with securing IT.

The Risks of Using End-of-Life Console Servers

End-of-life console servers can undermine the security and functionality of IT systems. These risks include:

1. Lack of Security Features and Updates

Aging console servers lack adequate hardware and management security features, meaning they can’t support a zero trust approach. On top of this, once a console server reaches EOL, the manufacturer stops providing security patches and updates. The device then becomes vulnerable to newly discovered CVEs and complex cyberattacks (like the MOVEit and Ragnar Locker breaches). Cybercriminals often target outdated hardware because they know that these devices are no longer receiving updates, making them easy entry points for launching attacks.

2. Compliance Issues

Many industries have stringent regulatory requirements regarding data security and IT infrastructure. DORA, NIS2 (EU), NIST2 (US), PCI 4.0 (finance), and CER Directive are just a few of the updated regulations that are cracking down on how organizations architect IT, including the management layer. Using EOL hardware can lead to non-compliance, resulting in fines and legal repercussions. Regulatory bodies expect organizations to use up-to-date and secure equipment to protect sensitive information.

3. Prolonged Recovery

EOL console servers are prone to failures and inefficiencies. As these devices age, their performance deteriorates, leading to increased downtime and disruptions. Most console servers are Gen 2, meaning they offer basic remote troubleshooting (to address break/fix scenarios) and limited automation capabilities. When there is a severe disruption, such as a ransomware attack, hackers can easily access and encrypt these devices to lock out admin access. Organizations then must endure prolonged recovery (just look the still ongoing CrowdStrike outage, or last year’s MGM attack) because they need to physically decommission and restore their infrastructure.

The Importance of Replacing EOL Console Servers

Here’s why replacing EOL console servers is essential to securing IT:

1. Modern Security Approach

Zero trust is an approach that uses segmentation across IT assets. This ensures that only authorized users can access resources necessary for their job function. This approach requires SAML, SSO, MFA/2FA, and role-based access controls, which are only supported by modern console servers. Modern devices additionally feature advanced security through encryption, signed OS, and tampering detection. This ensures a complete cyber and physical approach to security.

2. Protection Against New Threats

New CVEs and evolving threats can easily take advantage of EOL devices that no longer receive updates. Modern console servers benefit from ongoing support in the form of firmware upgrades and security patches. Upgrading with a security-focused device vendor can drastically shrink the attack surface, by addressing supply chain security risks, codebase integrity, and CVE patching.

3. Ease of Compliance

EOL devices lack modern security features, but this isn’t the only reason why they make it difficult or impossible to comply with regulations. They also lack the ability to isolate the control plane from the production network (see Diagram 1 below), meaning attackers can easily move between the two in order to launch ransomware and steal sensitive information. Watchdog agencies and new legislation are stipulating that organizations follow the latest best practice of separating the control plane from production, called Isolated Management Infrastructure (IMI). Modern console servers make this best practice simple to achieve by offering drop-in out-of-band that is completely isolated from production assets (see Diagram 2 below). This means that the organization is always in control of its IT assets and sensitive data.

Diagram 1: Though an acceptable approach, Gen 2 out-of-band lacks isolation and leaves management interfaces vulnerable to the internet.

Diagram 2: Gen 3 out-of-band fully isolates the control plane to guarantee organizations retain control of their IT assets and sensitive info.

4. Faster Recovery

New console servers are designed to handle more workloads and functions, which eliminates single-purpose devices and shrinks the attack surface. They can also run VMs and Docker containers to host applications. This enables what Gartner calls the Isolated Recovery Environment (IRE) (see Diagram 3 below), which is becoming essential for faster recovery from ransomware. Since the IMI component prohibits attackers from accessing the control plane, admins retain control during an attack. They can use the IMI to deploy their IRE and the necessary applications — remotely — to decommission, cleanse, and restore their infected infrastructure. This means that they don’t have to roll trucks week after week when there’s an attack; they just need to log into their management infrastructure to begin assessing and responding immediately, which significantly reduces recovery times.

Diagram 3: The Isolated Recovery Environment allows for a comprehensive and rapid response to ransomware attacks.

Watch How To Secure The Network Backbone

I recently presented at Cisco Live Vegas on how to secure the network’s backbone using Isolated Management Infrastructure. I walk you through the evolution of network management, and it becomes obvious that end-of-life console servers are a major security concern, both from the hardware perspective itself and their lack of isolation capabilities. Watch my 10-minute presentation from the show and download some helpful resources, including the blueprint to building IMI.

Watch My Presentation

The post Why Securing IT Means Replacing End-of-Life Console Servers appeared first on ZPE Systems.

Critical Entities Resilience Directive

ZPE Systems — Wed, 05 Jun 2024 20:25:06 +0000

The Critical Entities Resilience (CER) Directive is an EU regulation designed to prevent disruption to the services considered essential to society or the economy. The CER Directive outlines the obligations of critical entities to prepare for any potential hazard, including natural disasters, human errors, terrorist attacks, and cybersecurity breaches. EU Member States have until 17 October 2024 to adopt and publish resilience measures required for their critical entities, and those measures officially take effect from 18 October 2024. Member States must identify and notify critical entities by July 2026; these entities then only have ten months to comply with CER requirements. With such a tight timeframe to demonstrate compliance with the Critical Entities Resilience Directive, organizations that might be deemed critical should begin preparing their resilience strategies now.

Citation: Directive (EU) 2022/2557 of the European Parliament and of the Council of 14 December 2022 on the resilience of critical entities and repealing Council Directive 2008/114/EC

Who does the Critical Entities Resilience Directive apply to, and why does it matter?

The CER Directive covers eleven sectors and subsectors that provide services essential to society, the economy, public health & safety, or preserving the environment. These include:

In-Scope Sectors Covered by the CER Directive
Sector	Subsectors
Energy	Electricity Heating and cooling Oil & gas Hydrogen
Transport	Air Rail Water Road Public transportation
Banking	Deposit, lending, and credit institutions
Financial Market Infrastructure	Trading venues Clearing systems
Health	Healthcare providers Reference laboratories Medicinal research and development Pharmaceutical manufacturers Critical medical device manufacturers Medicinal distributors
Drinking Water	Drinking water suppliers Drinking water distributors
Waste Water	Collection Treatment Disposal
Digital Infrastructure	Internet Exchange Point providers DNS providers Top-level domain (TLD) name registries Cloud service providers Data center service providers Content Delivery Networks (CDNs) Trust service providers Electronic communications providers Emergency communication networks
Public Administration	Public administration entities of central governments
Space	Operators of ground-based infrastructure for space-based services
Food Production, Processing, and Distribution	Large-scale industrial food production and processing Food supply chain services Food wholesale distributors

The Critical Entities Resilience Directive is one of several new EU regulations (such as DORA and NIS2) created to establish consistent guidelines for resilience in sectors where any service disruption has a significant negative impact on society or the economy. Whereas DORA applies primarily to financial institutions and supporting services, and NIS2 focuses on cybersecurity threats, the CER Directive is broader in scope and addresses other, non-digital threats to resilience such as natural disasters and global health crises (e.g., COVID-19).

The penalties for noncompliance will vary by Member State but are likely to include fines, public notification, remediation, and withdrawal of authorization.

CER Directive requirements for critical entities

Most of the CER Directive requirements apply to Member States, outlining how the designated authorities will adopt and enforce resilience measures and support critical entities in achieving compliance. However, there are five key provisions that relevant organizations should be aware of as they prepare for their identification as critical entities.

1. Article 4: Strategy on the resilience of critical entities

EU Member States have until 17 January 2026 to adopt a strategy outlining the guidelines and procedures for critical entities to achieve and maintain a high level of resilience. Essentially, this strategy will describe the requirements for CER Directive compliance in each Member State and provide guidance on how to meet those requirements. Potentially critical entities can prepare by examining existing resilience frameworks and regulations to anticipate the policies, tools, and procedures that will likely be required.

2. Article 5: Risk assessment by Member States

Member States have until 17 January 2026 to perform a risk assessment of all essential services. These assessments must account for natural and human-made risks, including accidents, natural disasters, public health emergencies, terrorist attacks, and antagonistic threats. Member States will then use the risk assessments to identify critical entities within each sector.

3. Article 12: Risk assessment by critical entities

Critical entities must perform risk assessments using similar criteria to Article 5 within nine months of being notified of their designation as critical and at least every four years afterward. If an organization already conducts risk assessments according to other similar resilience guidelines or frameworks, Member States have the authority to decide whether or not those assessments meet CER Directive compliance requirements.

4. Article 13: Resilience measures of critical entities

Critical entities must take the appropriate technical, security, and policy measures to ensure resilience, including a comprehensive strategy for service continuity and disaster recovery. Examples of resilience measures outlined by the CER Directive include:

CER Directive Resilience Measures
Requirements	Examples
Adopt disaster risk reduction and climate adaptation measures	Using an environmental monitoring system to detect and respond to rising temperatures, humidity, and other relevant conditions
Ensure adequate physical protection of the premises and critical infrastructure, including fencing, barriers, perimeter monitoring tools, detection equipment, and access controls	Installing proximity sensors in data center racks to automatically notify security teams if an unauthorized user physically tampers with remote infrastructure
Respond to, resist, and mitigate service disruptions	Deploying out-of-band (OOB) serial consoles with cellular capabilities to ensure continuous remote management access to critical infrastructure
Recover from incidents using business continuity measures to resume provisioning essential services	Building a resilience system containing all the infrastructure and tools needed to rebuild and recover while still delivering core services
Manage employee security by classifying personnel who exercise critical functions, establishing access rights and controls, and performing background checks as needed	Adopting zero-trust security policies and controls that assign access privileges according to role (role-based access control, or RBAC)

5. Article 15: Incident notification

Critical entities must notify the competent authority of any incidents that have or could significantly disrupt essential services within 24 hours of detection. The significance of a disruption is determined according to the following parameters:

How many users the disruption affects;
How long the disruption lasts;
The geographical area the disruption affects.

The incident notification must explain the nature, cause, and potential consequences of the disruption, including any cross-border implications.

How Nodegrid simplifies CER Directive compliance

Nodegrid is a Gen 3 out-of-band management platform that makes the perfect foundation for a resilience system. Nodegrid OOB separates the control plane from the data plane to ensure continuous remote management access to critical infrastructure even during production network outages. Vendor-neutral serial consoles and integrated branch service routers directly host third-party software for security, automation, recovery, and more, reducing hardware overhead at each site while ensuring teams have access to all the tools they need to restore essential services.

Looking to Upgrade to a Nodegrid serial console?

Prepare for the Critical Entities Resilience Directive by replacing your discontinued, EOL serial console with a Gen 3 out-of-band solution from Nodegrid.

Click here to learn more!

DORA Act: 5 Takeaways For The Financial Sector

Jordan Baker — Thu, 07 Mar 2024 18:57:50 +0000

The Digital Operational Resilience Act (DORA) is a regulatory initiative within the European Union that aims to enhance the operational resilience of the financial sector. Its main goal is to prevent and mitigate cyber threats and operational disruptions. The DORA Act outlines regulatory requirements for the security of network and information systems “whereby all firms need to make sure they can withstand, respond to and recover from all types of ICT-related disruptions and threats” (DORA Act website).

Who and What Are Covered Under the DORA Act?

The DORA Act is a regulation that covers all financial entities within the European Union (EU). It recognizes the critical role of information and communication technology (ICT) systems in financial services. DORA applies to financial services including payments, securities, credit rating, algorithmic trading, lending, insurance, and back-office operations. It establishes a framework for ICT risk management through technical standards, which are being released in two phases, the first of which was published on January 17, 2024. The DORA Act will go into effect in its entirety on January 17, 2025.

With cyberattacks constantly in the news cycle, it’s no surprise that governing bodies are putting forth standards for operational resilience. But without combing through this lengthy piece of legislation, what should IT teams start thinking about from a practical standpoint? Here are 5 takeaways on what the DORA Act means for the financial sector.

DORA Act: 5 Takeaways for the Financial Sector

1. Shore-up your cybersecurity measures

The DORA Act emphasizes strengthening cybersecurity measures within the financial sector. It requires financial institutions, such as banks, stock exchanges, and financial infrastructure providers, to implement robust cybersecurity controls and protocols. These include adopting advanced authentication mechanisms, encryption standards, and network segmentation to protect sensitive financial data and critical infrastructure from cyber threats. Part of this will also require organizations to apply system patches and updates in a timely manner, which means automated patching will become necessary to every organization’s security posture.

2. Implement resilience systems

Operational resilience is a key focus area of the DORA Act, aiming to ensure the continuity of essential financial services in the face of cyber threats, natural disasters, and other operational disruptions. Financial institutions are required to develop comprehensive business continuity plans, establish redundant systems and backup facilities, and conduct regular stress tests to assess their ability to withstand and recover from various scenarios. Implementing a resilience system helps with this, as it provides all the infrastructure, tools, and services necessary to continue operating during major incidents.

3. Conduct regular scans for vulnerabilities

The DORA Act mandates financial institutions to implement robust risk management practices to identify, assess, and mitigate cyber risks and operational vulnerabilities. This includes conducting regular assessments, vulnerability scans, and penetration tests, and developing incident response procedures to quickly address threats. This is all part of taking a proactive approach to identify and mitigate cyber incidents, and reduce the impact that adverse events have on financial stability and consumer confidence.

4. Collaborate and share information with industry peers

The DORA Act encourages financial institutions to share cybersecurity threat intelligence, incident data, and best practices with industry peers, regulators, and law enforcement agencies. The ability to monitor systems and collect data will be crucial to this approach, and will require systems that can rapidly (and securely) deploy apps/services during ongoing incidents. This will help financial institutions to better understand emerging threats, coordinate responses to cyber incidents, and strengthen collective defenses against threats and operational disruptions.

5. Segment physical and logical systems to pass regular audits

Through the DORA Act, regulators are empowered to conduct regular assessments, audits, and inspections of systems. This will ensure that financial institutions are implementing adequate controls and safeguards to protect against cyber threats and operational disruptions. A crucial part to this will involve physical and logical separation of systems, such as through Isolated Management Infrastructure, as well as implementing zero trust architecture across the organization. These will help bolster resilience by eliminating control dependencies between management and production networks, which will also help to streamline audits.

Get the blueprint to help you comply with the DORA Act

DORA’s requirements are meant to help IT teams better protect sensitive data and the integrity of financial systems as a whole. But without a proper network management infrastructure, their production networks are too sensitive to errors and vulnerable to attacks. ZPE has created the blueprint that covers these 5 crucial takeaways outlined in the DORA Act. The architecture outlined in this blueprint has been trusted by Big Tech for more than a decade, as it allows them to deploy modern cybersecurity measures, physically and logically separated systems, and rapid recovery processes. Download the blueprint now.

Download blueprint

The post DORA Act: 5 Takeaways For The Financial Sector appeared first on ZPE Systems.

Network Resilience: What is a Resilience System?

ZPE Systems — Fri, 02 Feb 2024 21:57:09 +0000

Network resilience means being able to withstand or recover from adversity, service degradation, and complete outages with minimal business disruption. The longer business-critical services are down, or systems are breached, the greater the risk of significant financial, reputational, and legal consequences. For example, after suffering a major cybersecurity breach in 2020, SolarWinds now faces legal action from the SEC, not to mention damaged customer trust. A resilience system is a set of technologies that enable an organization to continue operating while teams work to repair failures and recover from cyberattacks. But what exactly is a resilience system, and what does it look like? This guide to network resilience defines resilience systems, provides example use cases, compares them to related technologies like backups and redundant systems, and describes the key components required to build them.

What is a resilience system?

A resilience system provides all the infrastructure, tools, and services necessary to continue operating, if in a degraded state, during major incidents. It also includes everything needed to recover data, rebuild systems, perform security testing, and continue delivering core business functionality. A resilience system is typically isolated from the production network, preventing cybercriminals from finding and compromising it and ensuring teams have continuous access even if the primary network goes down.

Resilience system use cases

Some examples of the challenges that resilience systems help overcome include:

1. Ransomware recovery

In a ransomware attack, cybercriminals infect systems with malware that spreads throughout the network and encrypts any data it encounters. Modern ransomware now uses packaged attacks that move at machine speed, instantly incapacitating entire networks. Organizations completely lose access to critical systems and data until they pay a ransom, often in untraceable cryptocurrency. Ransomware is an exceptionally tenacious form of malware and tends to reinfect backup data and rebuilt systems, significantly hampering recovery efforts and increasing the duration and cost of the attack. The best practice for resilience systems is to isolate them on an out-of-band (OOB) network, inaccessible to hackers who have breached the production in-band network. Doing so creates a safe, isolated recovery environment (IRE) where teams can restore critical data and systems without the risk of reinfection. The resilience system includes all the tools and hardware needed to restore critical business services and infrastructure. An IRE significantly accelerates ransomware recovery and minimizes downtime, so businesses can avoid paying ransoms and reduce the overall cost of attacks.

Learn more about ransomware recovery with resilience systems:

2. Network outages

Enterprise network architectures and supply chains are highly complex, with lots of moving parts that rely on external vendors to maintain availability. Just one of those vendors dropping the ball could take the entire organization offline, severely impacting network resilience. For example, in 2023, an expired cryptographic certificate caused Cisco’s Viptela SD-WAN appliances to fail on reboot, completely taking down affected networks until the issue was resolved. With a resilience system, Viptela customers could have potentially avoided this downtime by failing over to alternative network resources. For example, a resilience system with integrated cellular failover allows branches to continue connecting to and delivering critical business services while also providing a lifeline for remote teams to access and recover failed systems. A resilience system also provides observability and automatic notifications so teams are instantly alerted to issues like certificate expirations and can respond quickly to recover critical services.

3. Shift to remote work

Incidents like ransomware attacks and equipment failures happen frequently enough that companies can create detailed plans and proactively implement solutions to minimize their impact, but not all adverse events are so predictable. When the COVID-19 pandemic struck, the massive shift to remote work strained the network resources of most organizations. Instead of maintaining a limited number of branch offices, teams suddenly had to treat every employee as a new branch, leading to performance degradation and outages as they scrambled to reinforce the business’s remote capabilities. A resilience system gives teams the tools and resources they need to provision additional infrastructure, manage networking logic, deploy new security solutions, and more, even while the primary network is offline or under a heavy load. A resilience system is the key to quickly adjusting network performance and security to adapt to sudden changes like a transition to fully remote operations.

Do backups and redundancy equate to network resilience?

The short answer is no; backups and redundancy do not equate to network resilience, though they do contribute to making systems more resilient.

Backups are copies of data, configurations, and application code used to do a hot or cold restore when a production system fails. The underlying infrastructure must remain operational for teams to access and use backups, and unless additional resilience measures are taken, it’s easy for backups to become infected or compromised, severely hampering recovery efforts.
Redundancy involves duplicating critical systems, services, and applications as a failsafe in case the primaries go down. Organizations can “fail over” to the redundancies to continue critical business operations during outages. However, redundant systems are just as susceptible to failures and infections without additional resilience measures like out-of-band management and isolated management infrastructure.

Backups and redundancy are part of network resilience but alone are not enough to ensure business continuity. Resilience systems focus on maintaining the architecture of the production network while adding the ability to recover or adapt to adversity. The next section discusses all the tools and technologies that make up network resilience systems.

What does a resilience system look like?

There are four key components that go into a resilience system.

Key Components of a Resilience System
Alternative Networking	Full-stack routing and switching, Wi-Fi, VoIP, virtualization, software-defined network overlays for SDN & SD-WAN
Alternative Compute	Full-stack compute, containers, virtual machines, and any other resources needed to run applications and deliver services
Storage & Storage Recovery	Enough storage to recover systems and applications as well as support content delivery
Automation	Tools like zero-touch provisioning (ZTP) to facilitate speedy recovery while minimizing human error

Alternative networking and compute resources ensure the organization can failover in the event of a network failure or continue delivering services when production servers are unavailable. Teams also need enough storage to restore backup data, build new systems, and support the content delivery network (CDN). Automation solutions like zero-touch provisioning (ZTP), configuration management, and security validation tools accelerate the recovery process while mitigating the risk of human error. Combined, these components enable teams to reduce the frequency, severity, and duration of outages, improving overall network resilience.

Network resilience with ZPE Systems

A resilient network will continue delivering critical business services in the face of any challenge, whether from cybercriminals, supply chain issues, global events, or even plain human error. A resilience system is isolated from the production network to ensure security and availability, and it consists of all the tools and technologies needed to troubleshoot, recover, and deliver your most crucial data, applications, and infrastructure. The Nodegrid platform from ZPE Systems is the perfect foundation for a resilience system. Nodegrid is a vendor-neutral, out-of-band management solution capable of running your choice of third-party software. Nodegrid allows you to build a highly customizable IRE containing all the tools needed to safely recover from ransomware. You can even use Nodegrid to deliver services while the primary network or systems are down, making it your all-in-one network resilience multi-tool.

Want to ensure network resilience by accelerating ransomware recovery?

Minimize the business impact of ransomware with the help of our whitepaper, 3 Steps to Ransomware Recovery. Learn how to follow Gartner’s best practices to build an Isolated Recovery Environment

Download Whitepaper

The post Network Resilience: What is a Resilience System? appeared first on ZPE Systems.