// Guide · Day 2 operations

VCF Day 2 Operations Planner

Day 2 VCF operations are where most outages happen — cluster expansions go wrong, certificate rotations fail mid-flight, password changes break SSO. This tool walks through each common Day 2 task as a structured runbook with prerequisites, ordered steps, validation checkpoints, and rollback notes.

Cluster ExpansionWorkload DomainCertificate RotationPassword RotationvSAN ExpansionHost ReplacementNSX Segments

Open the tool Jump to walkthrough

Quick start

Pick the operation — Expand Cluster, Add Workload Domain, Certificate Rotation, Password Rotation, vSAN Expansion, Add NSX Segment, Replace a Failed Host.
Review prerequisites — the tool lists what must be true before starting.
Walk through ordered steps — tick each as you complete it. Steps include UI navigation, expected results, and validation commands.
Add notes — record IPs, account names, ticket numbers next to each step.
Capture as a record — export the completed runbook for change management.

On this page

When to use this tool
How it works
Step-by-step walkthrough
Examples
Common mistakes
Related tools
FAQ

When to use this tool

Use this tool when you need to:

Run any common Day 2 VCF operation — the runbooks cover the operations that go wrong most often.
Hand off operational tasks to a team member who hasn't done that specific operation before.
Document the change with a structured trail of completed steps + your inline notes.
Plan rollback ahead of time — each runbook ends with a rollback section if reversal is needed.
Train new VCF operators — runbooks make implicit knowledge explicit.

How it works

Each runbook follows the same five-section pattern:

Prerequisites — what must be true before starting (host on HCL, capacity headroom, maintenance window, etc.)
Pre-flight validation — commands/checks to run that confirm the prerequisites are actually met
Execution steps — the actual operation, broken into ordered steps with UI navigation, expected results, and validation between steps
Post-validation — confirm the operation completed successfully (cluster healthy, services up, no alerts)
Rollback — what to do if it goes wrong, when rollback is even possible

The runbook persists in your browser — you can pause mid-execution, close the tab, and resume later.

Step-by-step walkthrough

1. Pick a runbook

Available runbooks:

Expand Cluster — add 1+ hosts to an existing cluster
Add Workload Domain — create a new WLD on existing management infra
Certificate Rotation — rotate certs on vCenter, SDDC Manager, NSX Managers, ESXi hosts
Password Rotation — rotate root, admin, SSO, NSX passwords
vSAN Expansion — add capacity (more disks, more disk groups, or more hosts)
Add NSX Segment — create new overlay or VLAN-backed segment
Replace a Failed Host — RMA an ESXi host and bring its replacement online

2. Read prerequisites carefully

Skipping prerequisites is the #1 cause of Day 2 operation failures. Examples:

Cluster expansion: new host on HCL, IP/DNS pre-configured, host commissioned, sufficient license capacity
Cert rotation: backup taken, new certs from CA in correct format, downtime window approved
Password rotation: SDDC Manager backup verified restorable, all dependent integrations identified

3. Run pre-flight validation

Before each runbook executes, run the validation commands. Examples:

For host expansion: esxcli system version get to confirm ESXi version matches cluster
For cert rotation: openssl x509 -in newcert.pem -text to verify SAN entries

4. Walk through execution steps

Each step shows what to do, what to expect, how long it typically takes, and what can go wrong. Tick each step complete before moving to the next.

5. Run post-validation

Operation looks complete? Don't walk away yet. Post-validation confirms: clusters healthy in vCenter, SDDC Manager green, no critical alerts, backup of new state captured, affected workloads still functional.

6. Rollback if needed

Each runbook ends with rollback. Some operations (cluster expansion) are easy to roll back; others (cert rotation, password rotation mid-flight) are not. Knowing rollback feasibility informs your maintenance window length.

7. Export the completed runbook

Once done, export the runbook with all your notes, completion timestamps, and audit trail. Drop in your CMDB or change record.

Examples

Example · Cluster expansion runbook

Operation: Add 1 host to an existing 4-host vSAN ESA cluster.

Prerequisites: host on HCL, DNS records created, license headroom, vSAN capacity headroom, network ports configured
Pre-flight: verify host firmware, run vSAN Health pre-check
Execution: commission host in SDDC Manager → assign to cluster → wait for vSAN re-balance
Post-validation: vSAN Health green, all 5 hosts in HA cluster
Rollback: decommission via SDDC Manager (data automatically migrates back)

Example · Certificate rotation runbook

Operation: Rotate vCenter certificate to new CA-signed cert.

Prerequisites: SDDC Manager backup, new cert + chain in correct PEM format, all SANs covered
Pre-flight: verify cert SAN list against vCenter FQDN + IP, validate cert chain
Execution: upload cert to SDDC Manager → trigger rotation workflow → wait 15-30 min
Post-validation: connect to vCenter via browser, verify new cert; check NSX/SDDC integrations still work
Rollback: SDDC Manager keeps previous cert; revert via UI if rotation fails

Common mistakes

🚨

Skipping the SDDC Manager backup Most Day 2 operations touch SDDC Manager state. A botched operation without a backup is catastrophic. Always backup, always verify the backup is restorable, before any significant operation.

⚠

Doing rotations during business hours Cert and password rotations briefly disrupt SSO and integration. Even if "supposed to be transparent," they're not in practice. Schedule maintenance windows.

⚠

Not testing on a non-prod environment first Operations like NSX upgrades, large vSAN policy changes, or password rotations should be rehearsed in non-prod first. Most VCF outages are operator error on first execution of a rare operation.

⚠

Forgetting downstream consumers of credentials/certs Backup tools, monitoring agents, automation systems, external SSO integrations all consume vCenter creds/certs. Rotating without updating them = silent integration failures. Inventory consumers before rotating.

⚠

Pausing mid-operation without documenting state If you pause cert rotation between "uploaded new cert" and "applied to all components," the system is in inconsistent state. Document explicitly where you stopped before walking away.

Tools that pair well with VCF Day 2 Operations Planner:

FAQ

Can I add my own custom runbooks?

Not currently — runbooks ship as built-in. Custom runbook authoring is on the roadmap.

Does the tool execute the operations for me?

No — it's a guide. You execute each step manually via SDDC Manager UI, vCenter, or CLI. The deliberate human-in-the-loop is a feature.

Why does cert rotation need a maintenance window?

During rotation, components briefly fail to validate each other's certs. SSO logins fail, NSX→vCenter integration disrupts, backup tool authentication breaks.

Can I roll back any Day 2 operation?

No. Cluster expansion, NSX segment add, password rotation: yes. Certificate rotation post-completion, vCenter major version upgrade, vSAN policy change after data movement: practically no. Each runbook tells you which is which.

How do I handle a partial failure mid-runbook?

Stop. Document where the failure occurred. Check if rollback is possible at that point. If not, escalate. Don't continue forward without understanding the failure.

Should I do multiple Day 2 ops in one maintenance window?

Generally no. Each operation has its own risk; combining them combines the risks AND makes troubleshooting harder. Rare exception: bundled changes that VMware explicitly recommends doing together.