VMworld 2014/Other Notes

From Omnia
Jump to navigation Jump to search

8/24/14 (VMWorld in SFO)

easy walk from hotel
registered using qr code in email

sw defined data center

    umbrella term for underlying technologies: vcloud management, vsan, nsx
    reduces overhead of datacenter IT
    customer reports IT deparment went from 500 people to 39 - runs business on 6 racks
    vCloud - private cloud
    this is seizmic - maybe you felt it early this morning
    vRealize - packages of management components
        enterprise
        smb
        SaaS (sw as a service) - vRealize Air Automation
    new competencies this year: mgmt automation, sw defined storage, betworking virtualization
    paas (question)
    openstack APIs provides access to sddc infrastructure - vmware contributes to this open source project
        www.openstack.org - Open source software for building private and public clouds.

hybrid cloud strategy

    vcloud air is a hybrid strategy
    vcloud air replaces on-premises services, as needed
    same mgmt, networking, & security
    today 6% of workload is in the cloud
    services
        devops
        db as a service
            Microsoft SQL & MySQL
        object storage
            beta using EMC in Sept - GA about EOY
        mobility services
        cloud mgmt
            vRealize Air Automation (formerly vCloud automation center)

whats new in vCloud suite

    dc virtualization & standardization
        vCenter support assistant
            automatic regular data collection with pattern recognition
    security controls native to infrastructure
        vSphere replication improvements
    HA & Resilient Infrastructure
        vCenter Site Recovery Manager (SRM)
            disaster recovery
            disaster avoidance
            planned migration
                can test a proposed migration/upgrade
            new
                vCO plugin
                    APIs are also accessible via power CLI
                support for more vms
                faster using batch processing
                integrated with web UI
            works within a local NSX environment, not across the entire NSX environment
            does not support vCloud air, but the cloud will have this type of functionality someday
    app & infrastructure delivery automation
        vcac
            new interfaces with more flexible workflows
            NSX - control from vSphere
            puppet integration
            localization - 10+ languages

8/25/14 (at VMWorld)

Storage DRS deep dive

    stuff is in vShpere 6.0 beta
    problem
        shared datastore with 2 different workloads
        you add a backup, but it uses alot of your bw
        you want for it to just use enough that it is done on time
    storage performance controls
        shares
            VM is assigned a shares value indicating the relative IOPs load it should get
        limit
            max IOPs allowed per VM
        reservations
            min IOPs per VM
    ESX 5.5 IO scheduler (mClock)
        implements scheduling using the above controls
        breaks large IOs in 32KB for accounting perposes, so the IOPs controls also control bw
    storage IO control
        works across hosts that share a store
        congestion detection based on latency threshold, causes host to be throttled
        threshold is a setting
    sdrs overview
        gbalancegggs load by moving vdisks between stores in the storage cluster
        allows vdisks to have affinity for each other, so it one wants to move, the others will also
    sdrs deployment
        you have to understand how this works when using complex storage use cases
            thin
            dedup
            auto-tiering
    sdrs monitors replications
    storage io control best practices
        don't mix vsphere luns and non-vsphere luns
        hosts io queue size set highest allowed
        set congestion threshold conservatively high
    ds cluster best practices
        similar ds performance
        similar capacities
    ds & host connectivity
        allow max possible connectivity
    vSphere storage poicy based management
        now works with different profiles

How VMware virtual volumes (VVols) will provide shared storage with x-ray vision

    challenges in external storage architectures
    hypervisor can help
    knows the needs of the apps in real time
    global view of infrastructure
    SDS & VVols
        policy-driven control plane
        virtual data plane
            virtual data services
            virtual datastores
                VASA provider is a new player - agent for the array, ESX manages the array via vasa APIs
                arrays are logically partitioned into Storage Containers
                vm disks called virtual volumes are created natively on the Storage Containers
                IO from ESX to array is throug an access point calle protocol Endpoint (PE), so data path is essentially unchanged
                advertised data services are offloaded to the array
                managed through policies - no need to do LUN management
    HP 3PAR and VMware

Understanding virtualized memory management performance

    concerns
        vms configured memory size
            too small -> low performance
            too large -> high overhead
        #vms/host
            too many -> low performance
            too few -> wastes host memory
        memory reclamation method
            proper -> minimal performance impact
    layered mem mgmt (app, vm, host)
        each layer assumes it owns all configured memory
        each layer improves mem utilization by using free memory for optimizations
        cross-layer knowledge is limited
    memory undercommit
        sum of all vm memory size <= host memory
        no reclamation
    memory overcommit
        sum > host memory
        ESX may map only a subset of VM memory (reclaims the rest)
    memory entitlement & reclamation
        compute memory entitlement for each VM & reclaim if < consumed
        based on reservation, limit, shares, memory demand
        ESX classifies memory as active & idle
        sample each page each minute & see which were used
    entitlement parameters
        configured memory size (what guest sees)
        reservation (min)
        limit (max)
        shares (relative priority for the VM)
        idle memory
    reclamation techniques
        transparent page sharing - remove duplicate 4K pages in background
            uses content hash
        ballooning - pushes memory pressure from ESX into VM - used when host free memory > 4% of ESX memory
            allocates pinned memory from guest
            now that we know the guest can't use that memory it is reclaimed and given to another VMa
            possible side effect: cause paging in guest
        swapping & compression
            if ballooning runs out of memory
            randomly chooses a page to compress/swap - use swap if compression savings < 50%
    best practices
        performance goals
            handle burst memory pressure well
            constant memory pressure should be handleled by DRS/vMotion, etc
        monitoring tools
            vCenter perforace chart, esxtop, memstats
                host level
                use when isolating problem
            cVenter OPeraions (vCOps)
                monitor cluster/dc
                determine if yuou have a problem
        guard against active memory reclamation
            vm mem size > highest demand during peak loads
            if necessary, setting reservation above guest demand
            use stats from vCOps manager gui
        page sharing & large page
            memory saving from page sharing good for homogeneous vms
            intra- & inter-vm sharing
            what prevents sharing
                guest has ASLR (address space layout randomization)
                guest has super fetching (proactive caching)
                host has large page because ESXi does not share large pages
            why large page
                fewer tlb misses
                faster page table look up time
        impact on memory overcommitment
            sharing borken when any small page is ballooned or swapped
        best practices
            don't disable page sharing
            don't disable host large page, except with VDI
            install vmware tools & enable ballooning
            provide sufficient swap space in guest
            place guest swap file/partition on separate vdisk
            don't disable memory compression
            host cache is nice to have - maybe 20% of ssd - more is potentially wasteful
        optimizations of host swapping
            sharing before swap
            compressing before swap
            swap to host cache ssd
    memory overcomittment guidance
        comfigured
            sum mem all vm mem / host mem size
            keep > 1
        active
            sum mem active vm mem / host mem size
            keep < 1
        use vCenter Operasiont sto track avg & max mem demand
        monitor performace counters
            mwm.consumed does not mean anything
            reclamation counters (mem.balloon, swapUsed, compressed, shared) - non-0 values does not mean there is a problem
                it just means these things have done their job somewhere in the past
            mem.swapInRate constant non-0 means problem
            mem.latency - estimates the perf impact due to compression/swapping
            mem.active - if low, reclaimed memory is not a problem
            virtDisk.readRate writeRate
                large means more swapping is happening

IO Filtering

    allow filters to process a vms io to its vmdks. inside esx, outside vm.
    allow 3rd party data services
    VAIO
    filters running in userspace
    allows for out-of-band releases - isolates filters from kernel
    extremely performant - ~1us latency for filters framework
        the ESX kernel was modified to allow a usermode driver like this to be extremely performant
    general purpose API - raw IO stream
    limit v1 SDK to 2 use cases (for test considerations)
        cache
        replication
    only on vSCSI devices
        vSCSI turns T10 cmds into ioctls - find out more about this (?)
    services
        high performance event queue access
        tight integration with vSphere
        full access to guest ios - synch access
        automated deployment
        flexible - requires user to add vC extensions to manage
    design
        filter driver registers with VAIO
        IO: VM -> VAIO -> filter driver -> VAIO -> hardware
            filter has to send the IO on eventually
        response: hardware -> VAIO -> filter driver -> VAIO -> VM
        filter may initiate its own IOs
        filter may talk to flash or "other" that is recognized as a block device
        filters may share kernel space memory or can use IP sockets
        enents indicate when a snapshot or vmotion occurs
        only in C
        need both 32 & 64 bit version, because esx is a 32bit OS with a 64bit process space
        one instance per VMX, must be re-entrant
    EMC recoverpoint is partner - to be in their 2015 release
    SanDisk VAIO server side caching
        scalable, distributed r/w cache
    beta Q4 2014 with ESX6.0
        filters must be certified (signed by vmware)
    expect GA early in 2015 (depends on ESX6)

8/26/14 (at VMWorld)

SanDisk cache

    virtual SAN - 3-32 nodes share local storage
        contains vmdks
    virtual SAN cache
        30% reserved for write buffer
        storage policy
            failure to tolerate setting
            number disk drives per object (stripe width)
        design considerations
            performance
                #disk groups - speed/capacity tradeoff
                SSD parameters - ~10% HDD capacity
                storage policy
                disk controller - bw, qdepth, pass-thru vs raid0
            capacity
                use sd card to install vshpere & free 2 disk slots
            availability
    vsan monitoring gui
        would like to see historical data added
        used esxtop for that

meet the vvol engr team

    Derek Uluski, tech lead
    Patrick Dirks, Sr. Manager
    does not work with SRM yet - huge shift for SRM

whats next for sds?

    what are docker linux containers (?)
    a control abstraction, collecting storage by how it can be used (policies)

vsan deep dive

    product goals
        customer: vsphere admin
        reduce total cost of ownership (capex & opex)
        SDS for vmware
    what is it
        aggregates local flash & hdds
        shared ds for all hosts in the cluster
        no single point of failure
    scale-out - add nodes
    scale-up - increase capacity of existing storage
    3-32 nodes
    <= 4.4 PB
    2M IOPs 100% reads, 640K IOPs 70% READS
    highly fault tolerant
        resiliency goals in policy
    a combination of user and kernel code embedded into ESXi 5.5 to reduce latency
    simple cluster config & mgmt
        a check box in the new cluster dialog
        then automatic or manual device selection
    simplified provisioning for applications
        pick storage policy for each vm
    policy parameters
        space reservation
        # failures to tolerate
        # disk stripes
        % flash cache
    disk groups
        1 flash device + 1-7 magneteic disks
        host has up to 5 groups
    flash
        30% write-back buffer
        70% read cache
        ~10% of hdd
    storage controllers
        good queue depth helps
        pass-through or RAID0 mode supported
    network
        layer 2 multicast must be enabled on physical switches

8/27/14 (at VMWorld)

NSX-MH reference design

    2 flavors of NSX
    cloud
        compute - provided by hypervisors
        storage
        network & security - provided by NSX
    NSX-MH is for non-ESXi and/or mix of hypervisors
        any CMS
        any compute
        any storage
        any network fabric

vsan performance benchmarking

    exchange simulation
    oltp simulation
    olio (lots of vms)
        kept RAM/vm low to reduce vm caching and get vsan traffic
    analytics
        single vm/node
        separate ds and inter-vm networks
    VPI 2.0 (beta)
        data collection appliance
        analyzes live vm IO workloads
        each vm gets a score as to if it should be in a vsan cluster
    configuring for performance
        ssd:md ratio so ssd holds most of working set
    stripe width
        adjust if % of vms ios being served …
    I wonder if they have looked into the impact of using DCE components?

8/28/14 (at VMWorld)

Quid - augmented intelligence (vCenter)

    SSO
        ability to view multiple vCenters from one place
        multiple identity sources
        ability to use different security policies
    web client
    inventory service
        cache inventory information from the vpxd
        allows other products to show up in web client
        they are working on hiding the fact that this exists
    vCenter server
        vpxd
            communicates with hypervisors
            records stats
            services client requests
        Vctomcat
            health
            SRS - stats reporting service
            EAM - ESX Agent Manager
        log broswer
        PBSM
            SMS + policy engines
            services storage views client requests
    resource usage
        java processes for all these services, except vpxd
    performance
        biggest issue: resource requirements
    may need to tune JVM heap size accouring to inventory size
    minimum system configurations are just that
    embedded db for inventory service
        requires 2-3K IOPs, depending on load
        place on its own spindles, possibly ssds
    heaps
        must be tuned manually
    db performance
        vc stores statstics at 5-min intervals
        vc saves config changes
        vc answers certain client queries
        vc persists version
        rolls up stats - 30 min, 2 hours, 1 day
        purges stats
        purges events (if auto-purge is enabled, which is recommended)
        purges tasks (...)
        topN computation - 10 min, 30 min, 2 hrs, 1 day
        SMS data refresh - 2 hrs
        vc-to-db latency important (often more so than esx-to-vc latency)
            place db and vc close
        db traffic is mostly writes
        manage db disk growth
            ~80-85% is stats, events, alarms, tasks
            ~10-15% is inventory data
        640 concurrent operations supported, after that queued
        2000 concurrent sessions max
        8 provisioning operations/host at a time
            so when cloning, can use multiple identical sources to increase the concurrency
        128 vmontions/host at a time
        8 storage vmotions/host at a time
        limits can be changed but not officially supported
    beyond vc5.0
        5.1 & 5.5: stats tables are partitioned
    stats level
        level 2 uses 4x more db activity than level 1
        level 3 uses 6x more than level 2
        level 4 uses 1.4 more than level 3
        use vc stats calculator
        VCOps can be used for more advanced stats
    API performance
        powerCLI - simple to use, but involves client-side filtering
    web client
        C# client uses aggressive refresh of client data
        web client decouples client requests from vpxd
        3x less load than C# cleint
        make it easier for clients to write plugins - by adding data to inventory service
        merge on-premise and hybrid experience
        platform independence
        reduced refresh frequency
        leverages flex
        issues
            flex has issues
            performance - login time, ...
            different nav model (they tried to hide things that were used less)
            resource requirements
        performance
            chrome/IE faster than firefox
            browser machine should have 2 CPUs & 4GB
            browser, app server, & inventory server should be in the same geography
                can RDP to a local browser server
            size heaps
    looking ahead
        putting tasks back in their place
        right click will work like it used to
        improve lateral nav
    deployment
        single vs multiple vCenters
        single reduces latency but requires fully-resourced vm
        vCenter performance is sensitive to vc-to-esx latency
        sweet spot is 200H,2000VMs per vc
        seperate by
            departments
            pci/non-pci rackeds
            server/desktop workloads
            geographies
    SSO and linked mode
        SSO does not share roesprivileges/licenses
        linked mode
            allows this
            uses windows-only technology
            slower login
            slower search
            one slow vc can slow everything
        blogger #vCenterGuy
    future
        linux appliance with performance/feature parity with windows
        html5
        cross-vc vmotion
        linked sso convertence
        performance