Cloud Migration Planning: The Checklist That Saves Months

Cloud migrations fail in predictable ways — undiscovered dependencies, underestimated data volumes, poor rollback planning. The pre-migration work that avoids these failures is unglamorous and essential.

Cloud migrations are one of the highest-risk infrastructure projects a business undertakes. The blast radius of a failed migration is broad: business disruption, data integrity questions, customer impact, and weeks of recovery work. Most migration failures are predictable and preventable. The failures consistently trace back to insufficient pre-migration discovery, poor rollback planning, and underestimated complexity.

This checklist is the pre-migration work that converts a high-risk project into a structured operational exercise.

Phase 1: Discovery — What You’re Actually Migrating

The first failure mode: migrating what you think you have rather than what you actually have. The discovery phase builds an accurate inventory before any migration work begins.

Application inventory. For every application in scope:

  • What does it do? (Not the marketing description — the technical description)
  • What inbound and outbound network dependencies does it have?
  • What other applications depend on it?
  • What’s the normal load profile? Peak hours, peak load, baseline?
  • What’s the acceptable downtime window for migration? (Some applications can tolerate a maintenance window; others require zero-downtime cutover)

Network dependency mapping is where teams consistently underestimate scope. Applications talk to each other in ways that aren’t documented. A development database that was supposed to be isolated but has three undocumented integrations. A scheduled job that pulls data from a service that’s not in the migration scope. Draw the dependency map with network traffic analysis, not with documentation — documentation is always incomplete.

Data inventory. For every database and data store:

  • Total data volume (in GB/TB)
  • Growth rate (GB/day or GB/month)
  • Read/write patterns (primarily read? primarily write? mixed?)
  • Latency requirements (how long can reads take? writes?)
  • Consistency requirements (ACID? eventual consistency acceptable?)
  • Retention requirements (any compliance-driven data retention?)

The data volume and growth rate determine migration duration. A 5TB database migrated over a 100Mbps connection takes roughly 11 hours for initial transfer. A 50TB database takes 4.6 days. Replication lag during live migration depends on write rate. These numbers matter for planning the cutover window.

Infrastructure inventory. For every server, container, or managed service:

  • Operating system and version
  • Installed software and versions
  • Configuration management: is this server reproducible from code? Or is it a snowflake?
  • Hardware-specific requirements (specific CPU features, GPU, specific I/O performance?)
  • Licensing: is any software licensed per-host that would require re-licensing in cloud?

Snowflake servers — servers that were configured manually and whose configuration exists nowhere but in the running server — are a major migration risk. Before migrating a snowflake, invest in documenting or re-codifying its configuration. Migrating a snowflake means you’re migrating the unknown.

Phase 2: Assessment — What Can Move and How

Not everything migrates the same way. The standard migration strategy taxonomy (the 6 Rs):

Rehost (lift and shift) — move the application to cloud compute with minimal changes. Fastest to execute, least optimized for cloud. The right choice when time pressure is high or when modernization can happen post-migration.

Replatform — make small optimizations while migrating. Swap a self-managed MySQL server for RDS while keeping the application code unchanged. The managed service benefit with most of the speed of lift-and-shift.

Refactor/Re-architect — rebuild the application to be cloud-native. Highest value long-term; highest effort and risk.

Repurchase — replace the application with a SaaS alternative. A self-hosted CRM migrated to Salesforce, for example. Evaluate before assuming everything needs to move to cloud infrastructure.

Retain — some workloads shouldn’t migrate. Compliance requirements, latency requirements, cost profile, or technology constraints may make cloud migration the wrong choice for specific workloads.

Retire — some workloads should be shut down. The migration forces the inventory exercise that surfaces applications nobody’s using.

Assign a strategy to each application and validate with the application owners before committing to a timeline. The rehost vs. replatform decision significantly affects migration duration; the refactor decision requires its own project planning.

Phase 3: Rollback Planning

This is the phase that consistently gets insufficient attention. The questions that rollback planning answers:

Can you roll back if the migration fails? For a lift-and-shift migration, rollback usually means reverting DNS to the original infrastructure while the migrated environment is debugged. This requires that the original infrastructure stays running until the migration is validated — which has cost implications.

What’s the rollback window? DNS TTL determines how long a cutover takes to fully propagate and how long a rollback takes after a cutover is reverted. Set DNS TTL to 60 seconds at least 24 hours before migration — don’t discover a 3600-second TTL during the cutover window.

What’s the data rollback strategy? If writes happen to the new environment during the cutover window and you need to roll back, how do you handle the data written after the last sync point? This requires explicit design — either no writes during the validation window, or a bidirectional sync that’s reverted, or an accepted data loss window that’s documented and agreed upon.

Who has authority to call a rollback? Migrations often encounter unexpected issues. Having clear criteria for when to roll back vs. push through, and clear authority for who makes that call, prevents the scenario where the team debates rollback while customers are impacted.

Write the rollback plan before the migration begins. Test the rollback in a non-production environment if possible.

Phase 4: Cutover Planning

The cutover — the moment when production traffic moves from the old environment to the new — is the highest-risk moment of the migration. Plan it in detail.

The cutover runbook should be hour-by-hour. Who is responsible for each step? What’s the verification check at each step? What’s the trigger for the next step? This isn’t over-engineering — it’s the difference between a coordinated team executing a practiced procedure and a team improvising under pressure.

Pre-cutover validation. Run the new environment in parallel before cutting traffic over. Send a percentage of real traffic to the new environment (if your infrastructure supports this). Run synthetic load tests against the new environment. Verify that monitoring, alerting, and on-call procedures are configured for the new environment — not just the old one.

Communication plan. Who needs to know the migration is happening and when? Customer communications if downtime is expected? Internal stakeholders? Support team briefing?

Post-cutover validation window. Plan for a structured validation period after cutover — typically 30-60 minutes of active monitoring before declaring the migration complete. Identify the specific metrics that confirm the migration is successful: error rate below threshold, latency within normal range, all health checks passing, backup jobs running, monitoring alerts configured.

The Timeline Question

Every migration takes longer than the estimate. The estimation principle that compensates: add 50% to whatever your technical team estimates. The discovery phase will surface scope you didn’t know existed. The cutover preparation will take longer than planned. Something unexpected will happen during cutover.

Build that buffer explicitly into the project timeline, not as a private assumption. Stakeholders who understand the planning buffer don’t experience the buffer as slippage.

Our cloud migration and cost optimization practice has run migrations ranging from single-application lifts to complex multi-workload enterprise migrations. The principle is consistent: the quality of the pre-migration work determines the quality of the migration. Related: migrations that include DevOps and automation modernization — converting snowflake servers to IaC, introducing CI/CD — benefit from that work happening in parallel with migration planning.