Andrew Buss, originally published on The Register
Selecting the right choice of components for an industry standard x64 server is much more than the speed of the CPU and the size of the hard disk. Virtualisation puts fresh demands on CPU performance and crucially, stretches memory, networking and IO performance.
Consolidation efficiencies and sustainability efforts are reining in the dash to scale out data centres, with a move back to scale up platforms based on virtual architectures. These place high requirements on the reliability, availability and serviceability of all components. What impact will the supporting chipset have in enabling these new demands to be met?
The market for server chipsets used to be quite diverse. Component suppliers such as nVidia, SIS, Broadcom and ATi provided chipsets alongside large server OEMs such as IBM and HP, as well as chipsets from the CPU vendors themselves. Selection of an appropriate product could be quite challenging and complex.
The chipset market has undergone a period of consolidation, with the result that much of the market for server chipsets is provided by Intel or AMD for their own processor ranges, with niche offerings from vendors such as IBM or SGI for high-end servers or compute clusters.
With such a tightly coupled choice of CPU and chipset, it might seem that the choice of chipset for a server is a no-brainer. After deciding on the performance characteristics of the server and selecting the appropriate CPU type, speed and socket or core count, the selection of chipset will be that which supports the required CPU at the best cost. And this approach will work in many cases, where average utilisation is generally low, and where service quality in times of peak demand can tolerate the odd wobble.
But for many organisations looking to ride the virtualisation wave, consolidate servers or meet challenging business service requirements, it may be that it is the chipset that should be the starting point leading to appropriate choice of CPU and ancillary components.
The chipset is the glue that binds the various major components of the server together. It has generally been responsible for the interface to system memory, although that responsibility has diminished in recent years. But it is still responsible for linking things like the major I/O subsystems, particularly storage, networking and PCIe expansion cards.
As such, the chipset has major implications on the overall performance, scalability and reliability of the server in which it is used. With the changing usage models of servers, in particular the move to reduce the number of physical servers in a consolidation push and to virtualise workloads, these characteristics are becoming more important.
The features and performance of the chipset should be at the sharp end of the requirements list, and not hidden away as so many server vendors tend to do. While some manufacturers are quite forthcoming about providing information on their own value-added chipset offerings, it can be exceedingly difficult at times to tease out any information at all on chipsets if they are provided by the CPU vendor or another third party.
Virtualisation is driving a reconsideration of the scale-out approach that came to characterise the server architecture design of the middle part of the last decade. In many cases, smaller, simpler machines running single applications, and a limited set of operating system services replaced fewer large multi-processor boxes running multiple applications.
The simplicity of the approach meant reliability was not so much an issue, as taking down the machine affected a single service and had limited impact on user productivity. Performance was relatively easy to quantify for peak load, even if average utilisation could sit in the low single digits.
With power and cooling costs growing and space at a premium, not to mention management overhead, unchecked server sprawl would be the road to hell. Server and workload consolidation has become mainstream, enabled by improvements in virtualisation technologies, management and performance. For those looking to do more with less, a simple strategy may involve consolidating workloads onto fewer, but ultimately similar (if not identical) servers. This may suffice, but could lead to issues with performance, resource contention and scalability.
For companies embracing virtualisation and virtual workload management, a significant reduction in server count may be the best option. And for these workloads serious kit is required. These servers would be four, eight or possibly even more sockets. Support for large quantities of memory is crucial, as multiple virtual machines have a tendency to suck up RAM. But just as important is the ability to optimise IO in the virtual space, placing a premium on hardware acceleration for virtualisation of networking, storage and data transfers. The chipset choice is central to providing these capabilities.
As demand grows, so does the supporting infrastructure. The networking requirements of 20 or more active applications may require multiple 10G interfaces, not just two 1G sockets. Latency becomes an issue. Disk activity soars with multiple requests. Sustained performance becomes a dominant characteristic, and scalability to provide continued good performance at high loads is a key selection criterion. This has the net result of meaning the server needs to be considered as a balanced system platform to achieve the desired result.
As virtualisation usage increases, and highly virtualised environments emerge, the features of the chipset become central to the suitability of the platform. High performance, highly scalable servers are a significant investment. There are fewer of them, they are very expensive and they support an order of magnitude more applications and users than smaller servers running dedicated applications. Failures that may be tolerated, even if frequent, on less capable servers are not an option on such boxes.
Extensive Reliability, Availability and Serviceability (RAS) features are thus very desirable in the new virtual server. Multi-bit error detection and correction and machine-check abilities will enable the server to continue operating if errors are detected and are able to be corrected, or to signal error conditions to software to migrate workloads prior to shutdown if they are not recoverable. The ability to hot-swap components, including memory and CPUs, in addition to disks and networking, without interrupting services, will enable continued use of a valuable resource.