VMworld 2014/Virtual SAN Best Practices for Monitoring and Troubleshooting

From Omnia
Jump to navigation Jump to search

Virtual SAN

Virtual SAN - software-based storage built into ESXi

  • Aggregates local Flash and HDDs
  • Shared datastore for VM consumption
  • Distributed architecutre
  • Deeply integrated with VMware stack

VSAN GA with ESXi 5.5 Update 1

RVC

RVC - started as a VMware Labs "Fling"

  • Interactive command line, with lots of VSAN commands
  • Included in VC since 5.5 (windows and appliance)
  • Presents inventory as a file structure

HCL

Verify Hardware against VMware Compatibility Guide (VCG)

HCL Guides:

  • vSphere general compatibility guide (Servers, NICs, etc)
  • Virtual SAN compatilibyt guide - adpaters, Flash and HDDs

show adapters using RVC:

vsan.disk_info --show-adapters <cluster>/hosts/*

Virtual SAN HCL - http://vmware.re/vsanhcl

HCL steps:

When viewing HCL entry, also check the "Class" and performance is important

Network

Network - Misconfiguration Detected

  • VSAN requires 10GBe (or 1G dedicated)
  • Single L2 network among ESX hosts
  • IP Multicast

Show ESX configuration:

esxcli vsan cluster get
RVC: vsan.cluster_info <cluster>

Ensure all hosts have VSAN vmknic configured

WebClient: host -> manage -> networking -> vmkernel adapters
esxcli vsan network list
RVC: vsan.cluster_info <cluster>

Ensure VSAN vmknics are on right subnet

WebClient: host -> manage -> networking -> vmkernel adapters
esxcli ?
RVC: vsan.cluster_info <cluster>

Ensure Multicast is configured

tcpdump-uw -i <vmknic> udp port 23451
tcpdump-uw -i <vmknic> igmp

Issues

VM shows as non-compliant / inaccessible / orphaned

  • non-compliant - maybe one mirror down
  • inaccessible - really bad
  • orphaned - VC has forgotten about the VM

VSAN object accessible:

  • at least one RAID mirror is fully intact
  • quorum: more than 50% of components need to be available (witnesses count here)

RVC Reports

VSAN RVC state reports:

vsan.vm_object_info <vm>
vsan.disks_stats <cluster>
vsan.obj_status_report <cluster>
vsan.obj_status_report --filter-table 2/3 -print uuids <cluster>
vsan.cluster_info <cluster>
vsan.resync_dashboard <cluster>
vsan.check_state --refresh-state <cluster>
vsan.disks_stats <cluster>
vsan.check_limits <cluster>

Diagnostics

Use the vSphere Web Client - the C# desktop client doesn't show VSAN or VSAN errors

VM Provisioning Started Failing -

  • don't use: Cluster - Manage - Settings - Disk Management (where dissk were setup, it is not the right place to check disk health)
  • Use: monitor - virtual SAN - physical disk

Proactive approach, try creating vm on every host on the cluster:

web client: standard method
rvc: diagnostics.vm_create -d <statstore> -v <vmfolder> <cluster>

VMware believes in "Dog Fooding" - have many internal VSAN clusters running

Benchmarking

VSAN Observer (vsan.observer in RVC)

  • collects stats every 60 seconds
  • web interface
  • HOL Plug: check out VSAN Observer Hands On Labs

Outstanding IO chart in Observer is a good indicator that SSD speed is not sufficient (affects latency)

VSAN implements a priority traffic scheduler

Good References

Webinars on Monitoring/Troubleshooting:

VMware Blogs:

Community Blogs: