Tools / Guides / VCF Day 2 Operations Planner
// Guide · Day 2 operations

VCF Day 2 Operations Planner

Day 2 VCF operations are where most outages happen — cluster expansions go wrong, certificate rotations fail mid-flight, password changes break SSO. This tool walks through each common Day 2 task as a structured runbook with prerequisites, ordered steps, validation checkpoints, and rollback notes.

Cluster ExpansionWorkload DomainCertificate RotationPassword RotationvSAN ExpansionHost ReplacementNSX Segments
Open the tool Jump to walkthrough

Quick start

  1. Pick the operation — Expand Cluster, Add Workload Domain, Certificate Rotation, Password Rotation, vSAN Expansion, Add NSX Segment, Replace a Failed Host.
  2. Review prerequisites — the tool lists what must be true before starting.
  3. Walk through ordered steps — tick each as you complete it. Steps include UI navigation, expected results, and validation commands.
  4. Add notes — record IPs, account names, ticket numbers next to each step.
  5. Capture as a record — export the completed runbook for change management.
On this page

When to use this tool

Use this tool when you need to:

How it works

Each runbook follows the same five-section pattern:

  1. Prerequisites — what must be true before starting (host on HCL, capacity headroom, maintenance window, etc.)
  2. Pre-flight validation — commands/checks to run that confirm the prerequisites are actually met
  3. Execution steps — the actual operation, broken into ordered steps with UI navigation, expected results, and validation between steps
  4. Post-validation — confirm the operation completed successfully (cluster healthy, services up, no alerts)
  5. Rollback — what to do if it goes wrong, when rollback is even possible

The runbook persists in your browser — you can pause mid-execution, close the tab, and resume later.

Step-by-step walkthrough

1. Pick a runbook

Available runbooks:

2. Read prerequisites carefully

Skipping prerequisites is the #1 cause of Day 2 operation failures. Examples:

3. Run pre-flight validation

Before each runbook executes, run the validation commands. Examples:

4. Walk through execution steps

Each step shows what to do, what to expect, how long it typically takes, and what can go wrong. Tick each step complete before moving to the next.

5. Run post-validation

Operation looks complete? Don't walk away yet. Post-validation confirms: clusters healthy in vCenter, SDDC Manager green, no critical alerts, backup of new state captured, affected workloads still functional.

6. Rollback if needed

Each runbook ends with rollback. Some operations (cluster expansion) are easy to roll back; others (cert rotation, password rotation mid-flight) are not. Knowing rollback feasibility informs your maintenance window length.

7. Export the completed runbook

Once done, export the runbook with all your notes, completion timestamps, and audit trail. Drop in your CMDB or change record.

Examples

Example · Cluster expansion runbook

Operation: Add 1 host to an existing 4-host vSAN ESA cluster.

  • Prerequisites: host on HCL, DNS records created, license headroom, vSAN capacity headroom, network ports configured
  • Pre-flight: verify host firmware, run vSAN Health pre-check
  • Execution: commission host in SDDC Manager → assign to cluster → wait for vSAN re-balance
  • Post-validation: vSAN Health green, all 5 hosts in HA cluster
  • Rollback: decommission via SDDC Manager (data automatically migrates back)
Example · Certificate rotation runbook

Operation: Rotate vCenter certificate to new CA-signed cert.

  • Prerequisites: SDDC Manager backup, new cert + chain in correct PEM format, all SANs covered
  • Pre-flight: verify cert SAN list against vCenter FQDN + IP, validate cert chain
  • Execution: upload cert to SDDC Manager → trigger rotation workflow → wait 15-30 min
  • Post-validation: connect to vCenter via browser, verify new cert; check NSX/SDDC integrations still work
  • Rollback: SDDC Manager keeps previous cert; revert via UI if rotation fails

Common mistakes

🚨
Skipping the SDDC Manager backup Most Day 2 operations touch SDDC Manager state. A botched operation without a backup is catastrophic. Always backup, always verify the backup is restorable, before any significant operation.
Doing rotations during business hours Cert and password rotations briefly disrupt SSO and integration. Even if "supposed to be transparent," they're not in practice. Schedule maintenance windows.
Not testing on a non-prod environment first Operations like NSX upgrades, large vSAN policy changes, or password rotations should be rehearsed in non-prod first. Most VCF outages are operator error on first execution of a rare operation.
Forgetting downstream consumers of credentials/certs Backup tools, monitoring agents, automation systems, external SSO integrations all consume vCenter creds/certs. Rotating without updating them = silent integration failures. Inventory consumers before rotating.
Pausing mid-operation without documenting state If you pause cert rotation between "uploaded new cert" and "applied to all components," the system is in inconsistent state. Document explicitly where you stopped before walking away.

Tools that pair well with VCF Day 2 Operations Planner:

FAQ

Can I add my own custom runbooks?
Not currently — runbooks ship as built-in. Custom runbook authoring is on the roadmap.
Does the tool execute the operations for me?
No — it's a guide. You execute each step manually via SDDC Manager UI, vCenter, or CLI. The deliberate human-in-the-loop is a feature.
Why does cert rotation need a maintenance window?
During rotation, components briefly fail to validate each other's certs. SSO logins fail, NSX→vCenter integration disrupts, backup tool authentication breaks.
Can I roll back any Day 2 operation?
No. Cluster expansion, NSX segment add, password rotation: yes. Certificate rotation post-completion, vCenter major version upgrade, vSAN policy change after data movement: practically no. Each runbook tells you which is which.
How do I handle a partial failure mid-runbook?
Stop. Document where the failure occurred. Check if rollback is possible at that point. If not, escalate. Don't continue forward without understanding the failure.
Should I do multiple Day 2 ops in one maintenance window?
Generally no. Each operation has its own risk; combining them combines the risks AND makes troubleshooting harder. Rare exception: bundled changes that VMware explicitly recommends doing together.