VMworld 2014/Other Notes
Jump to navigation
Jump to search
8/24/14 (VMWorld in SFO)
easy walk from hotel registered using qr code in email
sw defined data center
umbrella term for underlying technologies: vcloud management, vsan, nsx
reduces overhead of datacenter IT
customer reports IT deparment went from 500 people to 39 - runs business on 6 racks
vCloud - private cloud
this is seizmic - maybe you felt it early this morning
vRealize - packages of management components
enterprise
smb
SaaS (sw as a service) - vRealize Air Automation
new competencies this year: mgmt automation, sw defined storage, betworking virtualization
paas (question)
openstack APIs provides access to sddc infrastructure - vmware contributes to this open source project
www.openstack.org - Open source software for building private and public clouds.
hybrid cloud strategy
vcloud air is a hybrid strategy
vcloud air replaces on-premises services, as needed
same mgmt, networking, & security
today 6% of workload is in the cloud
services
devops
db as a service
Microsoft SQL & MySQL
object storage
beta using EMC in Sept - GA about EOY
mobility services
cloud mgmt
vRealize Air Automation (formerly vCloud automation center)
whats new in vCloud suite
dc virtualization & standardization
vCenter support assistant
automatic regular data collection with pattern recognition
security controls native to infrastructure
vSphere replication improvements
HA & Resilient Infrastructure
vCenter Site Recovery Manager (SRM)
disaster recovery
disaster avoidance
planned migration
can test a proposed migration/upgrade
new
vCO plugin
APIs are also accessible via power CLI
support for more vms
faster using batch processing
integrated with web UI
works within a local NSX environment, not across the entire NSX environment
does not support vCloud air, but the cloud will have this type of functionality someday
app & infrastructure delivery automation
vcac
new interfaces with more flexible workflows
NSX - control from vSphere
puppet integration
localization - 10+ languages
8/25/14 (at VMWorld)
Storage DRS deep dive
stuff is in vShpere 6.0 beta
problem
shared datastore with 2 different workloads
you add a backup, but it uses alot of your bw
you want for it to just use enough that it is done on time
storage performance controls
shares
VM is assigned a shares value indicating the relative IOPs load it should get
limit
max IOPs allowed per VM
reservations
min IOPs per VM
ESX 5.5 IO scheduler (mClock)
implements scheduling using the above controls
breaks large IOs in 32KB for accounting perposes, so the IOPs controls also control bw
storage IO control
works across hosts that share a store
congestion detection based on latency threshold, causes host to be throttled
threshold is a setting
sdrs overview
gbalancegggs load by moving vdisks between stores in the storage cluster
allows vdisks to have affinity for each other, so it one wants to move, the others will also
sdrs deployment
you have to understand how this works when using complex storage use cases
thin
dedup
auto-tiering
sdrs monitors replications
storage io control best practices
don't mix vsphere luns and non-vsphere luns
hosts io queue size set highest allowed
set congestion threshold conservatively high
ds cluster best practices
similar ds performance
similar capacities
ds & host connectivity
allow max possible connectivity
vSphere storage poicy based management
now works with different profiles
challenges in external storage architectures
hypervisor can help
knows the needs of the apps in real time
global view of infrastructure
SDS & VVols
policy-driven control plane
virtual data plane
virtual data services
virtual datastores
VASA provider is a new player - agent for the array, ESX manages the array via vasa APIs
arrays are logically partitioned into Storage Containers
vm disks called virtual volumes are created natively on the Storage Containers
IO from ESX to array is throug an access point calle protocol Endpoint (PE), so data path is essentially unchanged
advertised data services are offloaded to the array
managed through policies - no need to do LUN management
HP 3PAR and VMware
Understanding virtualized memory management performance
concerns
vms configured memory size
too small -> low performance
too large -> high overhead
#vms/host
too many -> low performance
too few -> wastes host memory
memory reclamation method
proper -> minimal performance impact
layered mem mgmt (app, vm, host)
each layer assumes it owns all configured memory
each layer improves mem utilization by using free memory for optimizations
cross-layer knowledge is limited
memory undercommit
sum of all vm memory size <= host memory
no reclamation
memory overcommit
sum > host memory
ESX may map only a subset of VM memory (reclaims the rest)
memory entitlement & reclamation
compute memory entitlement for each VM & reclaim if < consumed
based on reservation, limit, shares, memory demand
ESX classifies memory as active & idle
sample each page each minute & see which were used
entitlement parameters
configured memory size (what guest sees)
reservation (min)
limit (max)
shares (relative priority for the VM)
idle memory
reclamation techniques
transparent page sharing - remove duplicate 4K pages in background
uses content hash
ballooning - pushes memory pressure from ESX into VM - used when host free memory > 4% of ESX memory
allocates pinned memory from guest
now that we know the guest can't use that memory it is reclaimed and given to another VMa
possible side effect: cause paging in guest
swapping & compression
if ballooning runs out of memory
randomly chooses a page to compress/swap - use swap if compression savings < 50%
best practices
performance goals
handle burst memory pressure well
constant memory pressure should be handleled by DRS/vMotion, etc
monitoring tools
vCenter perforace chart, esxtop, memstats
host level
use when isolating problem
cVenter OPeraions (vCOps)
monitor cluster/dc
determine if yuou have a problem
guard against active memory reclamation
vm mem size > highest demand during peak loads
if necessary, setting reservation above guest demand
use stats from vCOps manager gui
page sharing & large page
memory saving from page sharing good for homogeneous vms
intra- & inter-vm sharing
what prevents sharing
guest has ASLR (address space layout randomization)
guest has super fetching (proactive caching)
host has large page because ESXi does not share large pages
why large page
fewer tlb misses
faster page table look up time
impact on memory overcommitment
sharing borken when any small page is ballooned or swapped
best practices
don't disable page sharing
don't disable host large page, except with VDI
install vmware tools & enable ballooning
provide sufficient swap space in guest
place guest swap file/partition on separate vdisk
don't disable memory compression
host cache is nice to have - maybe 20% of ssd - more is potentially wasteful
optimizations of host swapping
sharing before swap
compressing before swap
swap to host cache ssd
memory overcomittment guidance
comfigured
sum mem all vm mem / host mem size
keep > 1
active
sum mem active vm mem / host mem size
keep < 1
use vCenter Operasiont sto track avg & max mem demand
monitor performace counters
mwm.consumed does not mean anything
reclamation counters (mem.balloon, swapUsed, compressed, shared) - non-0 values does not mean there is a problem
it just means these things have done their job somewhere in the past
mem.swapInRate constant non-0 means problem
mem.latency - estimates the perf impact due to compression/swapping
mem.active - if low, reclaimed memory is not a problem
virtDisk.readRate writeRate
large means more swapping is happening
IO Filtering
allow filters to process a vms io to its vmdks. inside esx, outside vm.
allow 3rd party data services
VAIO
filters running in userspace
allows for out-of-band releases - isolates filters from kernel
extremely performant - ~1us latency for filters framework
the ESX kernel was modified to allow a usermode driver like this to be extremely performant
general purpose API - raw IO stream
limit v1 SDK to 2 use cases (for test considerations)
cache
replication
only on vSCSI devices
vSCSI turns T10 cmds into ioctls - find out more about this (?)
services
high performance event queue access
tight integration with vSphere
full access to guest ios - synch access
automated deployment
flexible - requires user to add vC extensions to manage
design
filter driver registers with VAIO
IO: VM -> VAIO -> filter driver -> VAIO -> hardware
filter has to send the IO on eventually
response: hardware -> VAIO -> filter driver -> VAIO -> VM
filter may initiate its own IOs
filter may talk to flash or "other" that is recognized as a block device
filters may share kernel space memory or can use IP sockets
enents indicate when a snapshot or vmotion occurs
only in C
need both 32 & 64 bit version, because esx is a 32bit OS with a 64bit process space
one instance per VMX, must be re-entrant
EMC recoverpoint is partner - to be in their 2015 release
SanDisk VAIO server side caching
scalable, distributed r/w cache
beta Q4 2014 with ESX6.0
filters must be certified (signed by vmware)
expect GA early in 2015 (depends on ESX6)
8/26/14 (at VMWorld)
SanDisk cache
virtual SAN - 3-32 nodes share local storage
contains vmdks
virtual SAN cache
30% reserved for write buffer
storage policy
failure to tolerate setting
number disk drives per object (stripe width)
design considerations
performance
#disk groups - speed/capacity tradeoff
SSD parameters - ~10% HDD capacity
storage policy
disk controller - bw, qdepth, pass-thru vs raid0
capacity
use sd card to install vshpere & free 2 disk slots
availability
vsan monitoring gui
would like to see historical data added
used esxtop for that
meet the vvol engr team
Derek Uluski, tech lead
Patrick Dirks, Sr. Manager
does not work with SRM yet - huge shift for SRM
whats next for sds?
what are docker linux containers (?)
a control abstraction, collecting storage by how it can be used (policies)
vsan deep dive
product goals
customer: vsphere admin
reduce total cost of ownership (capex & opex)
SDS for vmware
what is it
aggregates local flash & hdds
shared ds for all hosts in the cluster
no single point of failure
scale-out - add nodes
scale-up - increase capacity of existing storage
3-32 nodes
<= 4.4 PB
2M IOPs 100% reads, 640K IOPs 70% READS
highly fault tolerant
resiliency goals in policy
a combination of user and kernel code embedded into ESXi 5.5 to reduce latency
simple cluster config & mgmt
a check box in the new cluster dialog
then automatic or manual device selection
simplified provisioning for applications
pick storage policy for each vm
policy parameters
space reservation
# failures to tolerate
# disk stripes
% flash cache
disk groups
1 flash device + 1-7 magneteic disks
host has up to 5 groups
flash
30% write-back buffer
70% read cache
~10% of hdd
storage controllers
good queue depth helps
pass-through or RAID0 mode supported
network
layer 2 multicast must be enabled on physical switches
8/27/14 (at VMWorld)
NSX-MH reference design
2 flavors of NSX
cloud
compute - provided by hypervisors
storage
network & security - provided by NSX
NSX-MH is for non-ESXi and/or mix of hypervisors
any CMS
any compute
any storage
any network fabric
vsan performance benchmarking
exchange simulation
oltp simulation
olio (lots of vms)
kept RAM/vm low to reduce vm caching and get vsan traffic
analytics
single vm/node
separate ds and inter-vm networks
VPI 2.0 (beta)
data collection appliance
analyzes live vm IO workloads
each vm gets a score as to if it should be in a vsan cluster
configuring for performance
ssd:md ratio so ssd holds most of working set
stripe width
adjust if % of vms ios being served …
I wonder if they have looked into the impact of using DCE components?
8/28/14 (at VMWorld)
Quid - augmented intelligence (vCenter)
SSO
ability to view multiple vCenters from one place
multiple identity sources
ability to use different security policies
web client
inventory service
cache inventory information from the vpxd
allows other products to show up in web client
they are working on hiding the fact that this exists
vCenter server
vpxd
communicates with hypervisors
records stats
services client requests
Vctomcat
health
SRS - stats reporting service
EAM - ESX Agent Manager
log broswer
PBSM
SMS + policy engines
services storage views client requests
resource usage
java processes for all these services, except vpxd
performance
biggest issue: resource requirements
may need to tune JVM heap size accouring to inventory size
minimum system configurations are just that
embedded db for inventory service
requires 2-3K IOPs, depending on load
place on its own spindles, possibly ssds
heaps
must be tuned manually
db performance
vc stores statstics at 5-min intervals
vc saves config changes
vc answers certain client queries
vc persists version
rolls up stats - 30 min, 2 hours, 1 day
purges stats
purges events (if auto-purge is enabled, which is recommended)
purges tasks (...)
topN computation - 10 min, 30 min, 2 hrs, 1 day
SMS data refresh - 2 hrs
vc-to-db latency important (often more so than esx-to-vc latency)
place db and vc close
db traffic is mostly writes
manage db disk growth
~80-85% is stats, events, alarms, tasks
~10-15% is inventory data
640 concurrent operations supported, after that queued
2000 concurrent sessions max
8 provisioning operations/host at a time
so when cloning, can use multiple identical sources to increase the concurrency
128 vmontions/host at a time
8 storage vmotions/host at a time
limits can be changed but not officially supported
beyond vc5.0
5.1 & 5.5: stats tables are partitioned
stats level
level 2 uses 4x more db activity than level 1
level 3 uses 6x more than level 2
level 4 uses 1.4 more than level 3
use vc stats calculator
VCOps can be used for more advanced stats
API performance
powerCLI - simple to use, but involves client-side filtering
web client
C# client uses aggressive refresh of client data
web client decouples client requests from vpxd
3x less load than C# cleint
make it easier for clients to write plugins - by adding data to inventory service
merge on-premise and hybrid experience
platform independence
reduced refresh frequency
leverages flex
issues
flex has issues
performance - login time, ...
different nav model (they tried to hide things that were used less)
resource requirements
performance
chrome/IE faster than firefox
browser machine should have 2 CPUs & 4GB
browser, app server, & inventory server should be in the same geography
can RDP to a local browser server
size heaps
looking ahead
putting tasks back in their place
right click will work like it used to
improve lateral nav
deployment
single vs multiple vCenters
single reduces latency but requires fully-resourced vm
vCenter performance is sensitive to vc-to-esx latency
sweet spot is 200H,2000VMs per vc
seperate by
departments
pci/non-pci rackeds
server/desktop workloads
geographies
SSO and linked mode
SSO does not share roesprivileges/licenses
linked mode
allows this
uses windows-only technology
slower login
slower search
one slow vc can slow everything
blogger #vCenterGuy
future
linux appliance with performance/feature parity with windows
html5
cross-vc vmotion
linked sso convertence
performance