Optimize 5G With Containers On Bare Metal

WHITEPAPER

Optimize 5G with containers on bare metal

Last updated: November 2, 2020
Download
Executive summary

The second wave of cloud migration for telecommunications networks is in progress. The first, network function virtualization (NFV), lowered costs by moving from appliance-based network elements to software-based equivalents while supporting the legacy need for element managers. Now, communications service providers (CSPs) are embracing cloud-native architectures and containers to increase efficiency, performance, resilience, security, and agility. The architecture of choice is to deploy containers on bare metal without an added layer of virtualization. This option presents significant advantages to their specific use cases. 

This whitepaper highlights the merits of deploying 5G services using containers on bare-metal servers by contrasting against containers on virtual machines (VMs).

Background

As network functions started moving to the cloud, some visionaries at major CSPs sought to move straight to containers and skip network function deployment on virtual machines. Red Hat was also an early proponent of this idea, having seen the value of Kubernetes while contributing heavily to its early releases. However, industry momentum and some shortcomings of containers for telco networks at the time established OpenStack® and VMs as the foundation of that change. Most CSPs have successfully deployed virtual network functions (VNFs) per the European Telecommunications Standards Institute (ETSI) NFV architecture to provide services today. Virtual machines running legacy applications will exist in the network and in the IT space at CSPs for the foreseeable future. However, most new applications and 5G components will be deployed using containers. 

Major technology changes are underway. Legacy applications are being deconstructed from the old monolithic form that best fit virtual machines into microservices. Agile and DevOps are the best practices for developing and managing applications today, and these work best when applications are built using smaller, standalone components. With this approach, features and fixes for each component can be developed and deployed in an iterative way without affecting other components of an application. Continuous integration and continuous delivery (CI/CD) is often required by CSPs and their vendors to support many upgrades or patches each year. Agile development allows them to provide far better support. 

The list of requirements that resulted in VM adoption has decreased over time. For example, a benefit of using VMs was their support for varied guest operating systems. Today, this level of complexity is unnecessary. Linux® is the standard operating system for network applications. Plus, a cloud that supports multiple operating systems increases the complexity of mitigating security vulnerabilities as they are discovered. Additionally, network functions and applications are now typically stateless, rather than stateful, and therefore not as dependent on their affinity to network storage. 

One of the most significant changes in the telco world is the evolution to 5G—both the radio access network (RAN) and the core.2 5G offers a CSP the ability to distribute network functions to physical locations, based on where they make the most sense to the customer and to the business. It is now possible to distribute packet core and voice functions across public clouds, centralized datacenters, regional datacenters, and even on site at the customer’s location. The RAN is also evolving with the collaboration and standardization within the OpenRAN (O-RAN) Alliance and the OpenRAN project to support functionality hosted on commercial off-the-shelf servers instead of proprietary hardware. The components can be spread out geographically, provided by multiple vendors, and use shared infrastructure among components of the telco core and value-added applications for end users. Containers are a natural fit for these improvements, and deploying them on bare metal brings significant benefits for CSPs. For legacy applications, there could be reasons to deploy containers on VMs, but for 5G, deploying containers with a second layer of complexity in the form of VMs is unnecessary and counterproductive. 

Operational efficiency

One of the key benefits of virtualizing a telco network is increased efficiency. Traditionally, every service offered to customers had its own technology segment maintained by separate groups managing individual layers of technology, plus network operations and customer support. By putting the different network functions that make up the services on the same cloud infrastructure, these multiple organizational layers are eliminated. 

The evolution from VMs to containers should not be a setback in efficiency. However, OpenStack and Kubernetes are two different areas of expertise, and putting containers in VMs adds a layer of resources to existing NFV architectures instead of just replacing one. Expanding the footprint of the network to include public cloud assets and thousands of remote edge sites significantly increases complexity. This complexity is being addressed by Kubernetes for containers—maintaining virtual machines in this environment creates unneeded complexity and operational inefficiency. Containers deployed on bare metal are supported in a common way across all cloud variants, whereas VMs and the varied platforms that support them, are not. As a result, it takes more people to support container and VM platforms concurrently—and even more people to support different platforms for public and private clouds. Ideally, a single container orchestration platform would be used for both types of cloud. 

The move to disaggregated cloud-native network functions (CNFs) can create an administrative challenge related to life-cycle management and assurance. Microservices that make up an application are not necessarily physically associated with each other either on the same server or even in the same location. Without significant automation, life-cycle management is cumbersome. CSPs and vendors employ DevOps tool chains, CI/CD methodology, and now closed loop assurance to gain efficiency. However, automation requires specific skills, and they are expensive and in short supply. As much as possible, it is important for the CSP to require a common CI/CD pipeline for all vendors.

Upgrades and patches need to be delivered, tested, and deployed automatically in a uniform way. Any additional complexity in the network requires more resources to write and update automation assets and manually maintain functions that cannot be automated. This added complexity hinders CSPs as they try to reduce costs to remain competitive and meet customer demands. 

Service agility

With NFV, CSPs gained service agility and are now able to deploy and decommission services much faster than before.3 In addition, with 5G, CSPs intend to open their networks to allow third parties to request and provision slices of their network for specialized wholesale applications. This type of service will be exposed via northbound application programming interfaces (APIs) that allow controlled access to third parties to use network resources. The speed of provisioning is critical to customer satisfaction. VMs take much longer to spin up than containers. With VMs, the customer could be waiting for hours while a slice they requested with two clicks of their mouse is instantiated in the form of many VMs at different network locations. However, this experience is improved by the near instantaneous deployment of a container-based service. 

Looking forward, the speed of containers on bare metal also makes differentiated service at the edge possible. An end user can request a value-added service and can have that service deployed on demand on an edge node close to them. The workloads themselves do not have to live continually on the edge nodes, freeing those nodes to host only the most in-demand applications. For instance, a subscriber who is mobile and using a high-bandwidth, ultra-reliable low-latency communications (URLLC) application enabled by 5G could have the application follow them as their proximity to different edge sites changes. The applications are instantiated in seconds as they approach a new edge site. This benefit is not possible if the containers are deployed on VMs, as the resources for all potential applications would have to be reserved at the edge and the VMs would have to be instantiated on demand—which would take an unacceptable amount of time. 

Service agility with containers helps bring in more revenues faster, frees up resources used by decommissioned services quicker, and allows a CSP to take risks that make them more competitive.4 These benefits are best realized when containers are deployed on bare metal.

Performance

In telecommunication networks, as in nature, evolution should result in advantages. It should solve problems, make businesses more competitive, and improve the chances of survival. Changes that result in higher costs and lower performance are not normally embraced. Telecommunications networks have specific performance requirements that make deploying containers on top of VMs a step backwards.

With NFV, one of the major hurdles for mobile core and internet protocol multimedia subsystem (IMS) applications was the interrupt tax that virtual machines created. Initially, this effect could result in as much as a 20% degradation of the server’s normal performance.5 For CSPs needing better performance, this result could disqualify certain applications from being virtualized. However, with a focus on architecture and advances in hardware acceleration, like single root input/output virtualization (SR-IOV) and data plane development kit (DPDK), this performance has improved significantly. The combination of containers and VMs makes this interrupt tax an issue again. Containers on VMs no longer cause any significant compute latency, but compared to containers on bare metal, the added overhead of VMs reduces the input/output (I/O) performance of a node. This reduction is more significant with small I/O transactions than with large. Voice and data traffic in a telecom network is often very fragmented and consists of small transactions. Today, the network functions requiring the best performance are I/O-intensive instead of central processing unit (CPU)-intensive, and their performance can be significantly impaired if deployed using containers on VMs. The impact can be as high as 30%, requiring more containers to do the same job.6 More containers on VMs means more servers—and more cost. 

Infrastructure benefits

Before NFV, utilization on network hardware was often very low under normal load. Redundancy was provided in hardware—and added cost. Spare parts were stocked nearby and further increased expenses. Capacity was modeled for peak usage periods, and daily costs were the same—whether a day was exceptionally profitable or was lower demand and less profitable. 

In theory, virtualizing multiple VNFs on the same infrastructure would significantly improve utilization. Applying advanced orchestration for life-cycle management would eliminate the need for redundant hardware and spares. The cloud would provide common capacity for all VNFs to expand or fail into. In practice, however, VNFs have often been deployed in disparate, vendor-specific hardware segments. The separation was needed because CSPs required vendors to maintain the same performance and uptime for their products as their legacy hardware. Without pooling cloud resources to support the 99.999% availability required by CSPs, the segments were still modeled to support peak capacity. And redundant hardware segments were often deployed for applications requiring high availability. The promise of increased utilization was not often achieved. And the result was often multiple virtual infrastructure managers (VIMs) from different vendors managing VMs in disparate technology segments in the same network. 

Because of modern software development and life-cycle management methods, 5G core and RAN components are typically architected as cloud-native functions running in containers. The architecture specified by the 3rd Generation Partnership Project (3GPP) allows further disaggregation of previously monolithic functions into microservices that will be provided by different vendors and open source projects. CSPs expect vendors to share cloud infrastructure, and cloud-native software architectures make coexistence among vendor solutions more acceptable than it was for NFV and other typical monolithic applications. 

Containers are very good at packaging microservices, and Kubernetes orchestrates applications that are composed of microservices. If the containers are deployed inside VMs, resources for storage, compute, and memory are used for each guest operating system. Those resources are reserved without knowing the needs of the containers that will eventually be deployed in each VM. As a result, utilization stays low, similar to legacy networks. Resources are reserved for peak usage to support a full complement of containers, but demand cannot consistently match the resources set aside. As with legacy networks, when the VM is dimensioned for peak traffic the typical usage of the VM resources is around 20%.7 Dimensioning containers on bare metal for an application does not reserve resources in the same way and can use as little as 20% of the compute resources as containers on VMs. Therefore, it is possible to fit more containers on a server. With VMs, low utilization and lower container density result in higher capital and operational expenses from the purchase and management of additional hardware. 

For edge applications enabled by 5G, optimizing resources is critical. These resources include network, RAN, and enterprise application workloads deployed close to the customer to benefit from the low latency and predictable bandwidth. Many edge sites are space constrained or environmentally challenged, so there is no room for extra hardware.

Also, the number of sites creates a per-site cost constraint, often limiting each to a single server, because total cost can multiply out of control quickly. For each edge server, virtual machines add the significant cost of all guest operating systems to the cost of the host operating system—the only one required for containers on bare metal. Containers on bare metal more efficiently facilitate more traffic and more services to boost revenue and profit margins. 

Security

For the CSP deploying 5G and edge networks, improved security is a major benefit of deploying containers on bare metal. With 4G/LTE and NFV, a relatively small number of private cloud sites were deployed and managed by CSPs. 5G opens the possibility of more distributed architectures, including multiple public clouds and potentially thousands of small cloud instances closer to or even on customers’ premises. In addition, applications providing value-added services to customers may share the same cloud instances with critical network functions. The combination of scale and multitenancy merits a security consideration. A common misconception exists that a VM is necessary to protect a containerized application from using resources needed for another application. Another misconception is that VMs provide secure isolation of workloads of one tenant from another. In reality, security is a function of the operating system, Linux, and not the orchestrator, Kubernetes. 

VMs isolate each other on a cloud node to stop workloads from taking needed resources from each other. And when used for NFV and monolithic applications, they are configured for the requirements of the applications running on them. Containers on VMs can rob each other of resources the same way they could on bare metal if not handled correctly. Also, isolating tenant workloads into separate VMs can lead to reduced utilization. The solution to these challenges is Security-Enhanced Linux (SELinux), a component that ships with several Linux distributions. It is the security kernel module originally developed by the U.S. National Security Agency (NSA) that allows administrators to control fine-grained authorization for access to the components of the operating system and node. Security policies are set up on the node itself, and then each container deployed on the node implements a policy that protects its interests. Even if the containers are deployed on VMs, SELinux should be invoked on the guest operating system to provide security. VMs are not necessary to secure containers, though. VMs just protect themselves from other VMs. Adding a layer of VMs actually can negatively impact security for the CSP’s network by making compliance more challenging. 

Another security consideration is compliance to security policies determined by the CSP, network tenants, and regulatory bodies. Hardware, operating systems, virtual infrastructure managers, applications, automation scripts, and other components must be constantly monitored and updated to remove potential security exploits discovered by the vendors, open source communities, or industry. As an example, consider the fifth year of operating a 5G network. At that point, multiple generations and possibly vendors of all components will exist in the network. Automating the regular patching and repair of drift from security expectations is necessary. Because VMs each add a guest operating system to maintain, staying in compliance is significantly more complicated. 

Some containers may support services that have specific SLAs for security. Abstraction of the physical hardware from the container in the form of a VM makes consistency in security and performance difficult. Kubernetes does not have the concept of the specific cloud node on which a container is placed. Assuring the security of a workload subject to commercial (e.g., payment card industry) or healthcare (e.g., HIPAA) compliance is not possible if an association between the hardware and host operating system cannot be provided for all containers. Security compliance is easier to set up and maintain when the containers are deployed on bare metal.

Conclusion 

4G/LTE and NFV required a relatively small number of private cloud instances to support a large telco network. Today, 5G brings tremendous change. A CSP wanting to benefit from new 5G features like low and predictable latency, high bandwidth, and distributed architecture will find success more easily by deploying new network functions on containers running on bare metal. They can realize lower capital expenditures (CapEx) and operating expenses (OpEx), be more agile and competitive, achieve better performance in a smaller footprint, and deliver greater network security. 

Open source delivers the benefit of faster innovation and problem resolution with entire communities focused on solving the challenges of the future. Choosing open source means that CSPs can avoid lock-in to a specific vendor. Red Hat provides feature-rich distributions of open source projects with value-added features to improve usability and automate administrative functions. CSPs can build their 5G networks with the following: 

  • Kubernetes-native JavaTM runtime

Red Hat has helped many CSPs globally with their network transformations to NFV. As a leading contributor to OpenStack, Kubernetes, and many other open source projects, we have significant expertise in evolving telecommunications network architectures to cloud native. Plus, certified network functions from Red Hat ecosystem partners provide choice, allowing you to deploy with confidence. Learn more about Red Hat’s telco solutions. 

ACG Research. “Economic advantages of virtualizing the RAN in mobile operators’ infrastructures,” sponsored by Red Hat. September 2019.

5G Americas. “5G and the cloud,” December 2019.

Red Hat case study. “Turkcell creates unified telco cloud with Red Hat OpenStack-based NFV,” April 2020.

Esfandiari, Shirin. “Bringing 5G-enabled services to life calls for cloud-native operations.” DevOps.com, August 20,
2019.

Liu, Ming, et al. “Understanding the virtualization “Tax” of scale-out pass-through GPUs in GaaS clouds: An empirical study.” IEEE.org, February 2015.

Lettieri, Guiseppe, et al. “A study of I/O performance of virtual machines.” The British Computer Society, Vol. 61
No. 6, 2018.

Liu, Ming, et al. “Understanding the virtualization “Tax” of scale-out pass-through GPUs in GaaS clouds: An empirical study.” IEEE.org, February 2015.