ESX Performance Guide
General Performance Guide
- Server:
- Lots of RAM is highly recommended. Virtualization is RAM heavy!
- Lots of CPUs with lots of cores is recommended. Physical CPU and cores are significantly better than what "hyper threading" provides.
- If possible, avoid NUMA architectures. A cluster of highly-peformant standard SMP server is a better fit for VSL than a single large NUMA server.
- BIOS:
- Make sure hardware assisted virtualization is enabled in the BIOS (Intel Virtulization Technology VT-x, AMD-V)
- Enable VT-d (IO MMU) if you wish to use PCI Passthrough
- Make sure you are running the latest version of the BIOS available for your system.
- Make sure hyper-threading is enabled.
- Set performance to highest level, fans to highest level
- Enable "Power Profile Maximum Performance", "Increased Cooling" and "Increased Fan Speeds"
- Disable all low power states like "C1E Support" and "C-STATE Tech"
- C-States may cause performance degradation, or possible LINT1 PSOD
- If NUMA is available, make sure it is enabled.
- Host:
- Set Host Power Policy to High Performance
- Direct IO:
- Use PCI Passthrough over VMFS. This will provide peak possible performance on an ESX system (with a tradeoff of loss of functionality like Snapshots). VMFS does have a noticeable overhead.
- VMDK:
- Use Zeroed Thick formatting (not Eager) when formatting VMDKs.
- Make sure partitions are aligned properly. (vSphere Client does this automatically)
- VM:
- Allocate appropriate resources, sometimes having too much can actually cause performance issues.
- Install VMware Tools (with virtual drivers)
- Use Paravirtualization where possible
- Set CPU/RAM resource shares to high
- Use reserved CPU/RAM resources
- Tuning:
- Use 'esxtop' to monitor performance and find bottlenecks
Benchmark Performance Guide
Where the real world numbers no longer matter. The following changes to the "General" performance guide could be made. Make sure to take metrics before and after to see if the changes do affect your particular workload.
- BIOS:
- Disable hyper-threading (questionable)
- Disable NUMA (questionable)
- Direct IO:
- Use PCI Passthrough over VMFS if possible
- VMDK:
- Fresh low level format drive
- Under format storage to 50% or less
- Create VMDK that is tiny or around 50% of the above
- VM:
- Single VM only with lots of RAM and vCPUs (careful, too many may actually cause performance "overhead" problems)
Hardware Storage Considerations
(source: Performance Best Practices for vSphere 5.1)
Back-end storage configuration can greatly affect performance. For more information on storage configuration, refer to the vSphere Storage document for VMware vSphere 5.1.
Lower than expected storage performance is most often the result of configuration issues with underlying storage devices rather than anything specific to ESXi.
Storage performance is a vast topic that depends on workload, hardware, vendor, RAID level, cache size, stripe size, and so on. Consult the appropriate documentation from VMware as well as the storage vendor.
Many workloads are very sensitive to the latency of I/O operations. It is therefore important to have storage devices configured correctly. The remainder of this section lists practices and configurations recommended by VMware for optimal storage performance.
- VMware Storage vMotion performance is heavily de pendent on the available storage infrastructure bandwidth. We therefore recommend you consider the information in “VMware vMotion and Storage vMotion” on page 53 when pl anning a deployment.
- Consider choosing storage hardware that supports VMware vStorage APIs for Array Integration (VAAI). VAAI can improve storage scalability by offloading so me operations to the storage hardware instead of performing them in ESXi.
On SANs, VAAI offers the following features:
- Hardware-accelerated cloning (sometimes called “full copy” or “copy offload”) frees resources on the host and can speed up workloads that rely on cloning, such as Storage vMotion.
- Block zeroing speeds up creation of eager-zeroed thick disks and can improve first-time write performance on lazy-zeroed thick disks and on thin disks.
- Scalable lock management (sometimes called “atomic test and set,” or ATS) can reduce locking-related overheads, speeding up thin-disk expansion as well as many other administrative and file system-intensive tasks. This helps improve the scalability of very large deployments by speeding up provisioning operations like boot storms, expansion of thin disks, snapshots, and other tasks.
- Using thin provision UNMAP, ESXi can allow the st orage array hardware to reuse no-longer needed blocks.
- NOTE In this context, “thin provision” refers to LUNs on the storage array, as distinct from thin provisioned VMDK
On NAS devices, VAAI offers the following features:
- Hardware-accelerated cloning (sometimes called “full copy” or “copy offload”) frees resources on the host and can speed up workloads that rely on cloning. (Note that Storage vMotion does not make use of this feature on NAS devices.)
- Space reservation allows ESXi to fully preallocate sp ace for a virtual disk at the time the virtual disk is created. Thus, in addition to the thin provisioning and eager-zero ed thick provisioning options that non-VAAI NAS devices support, VAAI NAS devices also support lazy-zeroed thick provisioning.
- NAS native snapshot can create virtual machine li nked clones or virtual machine snapshots using native snapshot disks instead of VMware redo lo gs. This feature, new in ESXi 5.1 and requiring virtual machines running on virtual hardware vers ion 9, offloads tasks to the NAS device, thus reducing I/O traffic and resource usage on the ESXi hosts.
- NOTE: Initial creation of virtual machine snapshots with NAS native snapshot disks is slower than creating snapshots with VMware redo logs. To reduce this performance impact, we recommend avoiding heavy write I/O loads in a virtual machine while using NAS native snapshots to create a snapshot of that virtual machine. Similarly, creating linked clones using NAS native snapshots can be slightly slower than the same task using redo logs.
- Though the degree of improvement is dependent on the storage hardware, VAAI can reduce storage latency for several types of storage operations, can reduce the ESXi host CPU utilization for storage operations, and can reduce storage network traffic.
Additional:
- Performance design for a storage network must take in to account the physical constraints of the network, not logical allocations. Using VLANs or VPNs does not provide a suitable solution to the problem of link oversubscription in shared configurations. VLANs and other virtual partitioning of a network provide a way of logically configuring a network, but don't change the physical capabilities of links and trunks between switches.
- VLANs and VPNs do, however, allow the use of network Quality of Service (QoS) features that, while not eliminating oversubscription, do provide a way to allo cate bandwidth preferentially or proportionally to certain traffic. See also “Network I/O Control (NetIOC)” on page 34 for a different approach to this issue.
- Make sure that end-to-end Fibre Channel speeds are consistent to help avoid performance problems. For more information, see VMware KB article 1006602.
- Configure maximum queue depth if needed for Fibre Channel HBA cards. For additional information see VMware KB article 1267.
- Applications or systems that write large amounts of data to storage, such as data acquisition or transaction logging systems, should not share Ethernet links to a storage device with other applications or systems. These types of applications perform best with dedicated connections to storage devices.
- For iSCSI and NFS, make sure that your network topology does not contain Ethernet bottlenecks, where multiple links are routed through fewer links, potentially resulting in oversubscription and dropped network packets. Any time a number of links transmitting near capacity are switched to a smaller number of links, such oversubscription is a possibility.
- Recovering from these dropped network packets results in large performance degradation. In addition to time spent determining that data was dropped, the retransmission uses network bandwidth that could otherwise be used for new transactions.
- Be aware that with software-initiated iSCSI and NFS the network protocol processing takes place on the host system, and thus these might require more CPU resources than other storage options.
- Local storage performance might be improved with write-back cache. If your local storage has write-back cache installed, make sure it’s enabled and contains a functional battery module. For more information, see VMware KB article 1006602.
- Make sure storage adapter cards are installed in slots with enough bandwidth to support their expected throughput. Be careful to distinguish between similar-sounding—but potentially incompatible—bus architectures, including PCI, PCI-X, PCI Express (PCIe), and PCIe 2.0 (aka PCIe Gen 2), and be sure to note the number of “lanes” for those architectures that can support more than one width.
- For example, in order to supply their full bandwidth potential, sing le-port 16Gbps Fibre Channel HBA cards would need to be installed in at least PCI Express (PCIe) G1 x8 or PCIe G2 x4 slots (either of which are capable of a maximum of 20Gbps in each direction) and dual-port 16Gbps Fibre Channel HBA cards would need to be installed in at least PCIe G2 x8 slots (which are capable of a maximum of 40Gbps in each direction).
- These high-performance cards will typically function just as well in slower PCIe slots, but their maximum throughput could be limited by the slots’ available bandwidth. This is most relevant for workloads that make heavy use of large block size I/Os, as this is where these cards tend to develop their highest throughput.
Documents
- Perf_Best_Practices_vSphere5.0.pdf - http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.0.pdf
- Perf_Best_Practices_vSphere5.1.pdf - http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.1.pdf
keywords