Cloud‑to‑Cloud Database Migration: A Practical Guide for Large PostgreSQL Systems
Migrating infrastructure from one cloud provider to another is already a complex process. But migrating a large, production PostgreSQL…
Cloud‑to‑Cloud Database Migration: A Practical Guide for Large PostgreSQL Systems
Migrating infrastructure from one cloud provider to another is already a complex process. But migrating a large, production PostgreSQL database, with schemas, constraints, triggers, partitions, indexes, extensions, and years of operational history, is an entirely different level of engineering.
This article breaks down the real‑world challenges, decisions, risks, and best practices involved in such migrations.
Why Database Migration Is Uniquely Hard
Applications are portable.
Containers can be rebuilt.
CI pipelines can be recreated.
But databases are:
- Stateful
- Performance‑sensitive
- Integrity‑critical
- Operationally fragile
A single wrong command can:
- Drop constraints
- Disable triggers
- Break partitions
- Corrupt encodings
- Destroy indexes
- Or silently degrade performance
Cloud database migration is not data transfer.
It is data engineering + systems engineering + risk management.
Common Use Cases for Cloud‑to‑Cloud Migration
Teams usually migrate because of:
- Cost optimization (AWS → GCP / GCP → DO)
- Vendor lock‑in concerns
- Regulatory or data‑residency requirements
- Performance issues
- Infrastructure consolidation
- Reliability problems
Each use case affects the migration strategy.
Example:
- Cost‑driven migration → minimize downtime
- Compliance migration → prioritize auditability
- Performance migration → re‑design indexes & partitions
What Actually Needs to Be Migrated (Beyond Tables)
Most failures happen because teams think only about data rows. In reality you must migrate:
1. Schemas
- Namespaces
- Ownership
- Permissions
2. Tables
- Data types
- Defaults
- NOT NULL constraints
3. Indexes
- B‑Tree / GIN / GiST / BRIN
- Partial indexes
- Expression indexes
4. Constraints
- Primary keys
- Foreign keys
- Unique constraints
- Check constraints
5. Partitions
- Range / list / hash partitions
- Partition constraints
- Parent‑child table relationships
6. Triggers
- Audit triggers
- Business logic triggers
- Replication triggers
7. Functions & Procedures
- PL/pgSQL
- Extensions
8. Extensions
- PostGIS
- pgcrypto
- uuid‑ossp
- citext
- TimescaleDB
9. Migration history
- Flyway / Liquibase tables
- Schema version tracking
Missing any one of these can break your system in subtle ways.
Strategic Decisions Before You Touch Any Command
1. Downtime vs Live Migration
Options:
- Full downtime migration
- Read‑only window
- Dual‑write period
- Logical replication
Each increases complexity exponentially.
2. Database Engine Parity
Verify:
- PostgreSQL major version
- Extensions availability
- Default configs
- Collation & locale
Even small version mismatches can break indexes or functions.
3. Data Volume
4. Security Model
- Users
- Roles
- Grants
- Secrets
Cloud providers handle IAM very differently.
pg_dump & pg_restore: Where Most Migrations Fail
Choosing the Correct Dump Format
Recommended Commands
Schema only
pg_dump -Fc --schema-only -f schema.dump dbname
Data only
pg_dump -Fc --data-only -f data.dump dbname
Full dump
pg_dump -Fd -j 8 -f dumpdir dbname
Restore Order (Critical)
- Roles & users
- Schemas
- Extensions
- Tables
- Data
- Indexes
- Constraints
- Triggers
- Validation
Wrong order = broken foreign keys or missing triggers.
The Dangerous Commands (Real Production Killers)
pg_dumpall > backup.sql # huge, slow, unsafe
pg_restore --clean # may DROP critical objects
DROP SCHEMA public CASCADE;
SET session_replication_role = replica; # disables triggers
One wrong flag → irreversible damage.
Configuration Differences That Break Systems
Cloud providers ship different defaults:
- work_mem
- shared_buffers
- wal_level
- max_connections
- autovacuum settings
- synchronous_commit
These affect:
- Query latency
- Index usage
- Deadlocks
- Replication stability
Always compare:
SHOW ALL;
Performance Regression After Migration
Common causes:
- Missing indexes
- Statistics not refreshed
- Different query planner
- Changed collation
- IO throughput differences
Fix with:
ANALYZE;
REINDEX DATABASE dbname;
Validation Checklist (Non‑Negotiable)
Structure
SELECT count(*) FROM information_schema.tables;
Triggers
SELECT * FROM information_schema.triggers;
Indexes
SELECT * FROM pg_indexes;
Extensions
SELECT * FROM pg_extension;
Row counts
Compare source vs target.
Application tests
Run full regression tests.
Real Risks Nobody Talks About
- Silent data truncation
- Encoding mismatch
- Timezone corruption
- Floating‑point drift
- Broken audit trails
- Lost permissions
Most of these are detected weeks later in production.
When Your Database Is in Terabytes and Billions of Rows
At small scale, mistakes are painful.
At terabyte scale, mistakes are catastrophic.
When your database contains:
- Multiple TB of data
- Billions of rows
- Hundreds of tables
- Thousands of indexes
- Complex partitions
- Heavy write traffic
…migration becomes a distributed systems problem, not just a database task.
Additional Challenges at TB Scale
1. Network Throughput Becomes the Bottleneck
- Cross-cloud bandwidth limits
- Packet loss
- Throttling
- NAT saturation
Even at 1 Gbps:
1 TB ≈ 2.5 hours (ideal conditions)
5 TB ≈ 12+ hours
10 TB ≈ 24+ hours
Real-world speeds are usually 40–60% of theoretical.
2. Storage IOPS Decide Success or Failure
Cloud disks differ massively:
- GCP PD-SSD
- AWS gp3 / io2
- DigitalOcean volumes
Low IOPS =
- Slow restores
- Checkpoint storms
- Autovacuum backlog
- Index build failures
Always benchmark disk write speed before restore.
3. pg_dump Stops Being Enough
For very large databases:
- pg_dump becomes slow
- Single-threaded metadata phase blocks
- Restore time explodes
Better approaches:
- pg_dump in directory format with high parallelism
- Logical replication
- Chunked table migration
- Physical replication (pg_basebackup)
Migration Strategies for Massive Databases
Strategy 1: Full Downtime (Rarely Acceptable)
- Stop writes
- Dump everything
- Restore
- Start application
Works only for internal systems or very tolerant products.
Strategy 2: Schema First + Batch Data Migration
- Dump schema
- Create structure on target
- Migrate tables in chunks
- Build indexes later
Reduces risk but complex to orchestrate.
Strategy 3: Logical Replication (Most Common)
- Setup publisher on source
- Setup subscriber on target
- Sync initial snapshot
- Catch up changes
- Switch traffic
Pros:
- Near zero downtime
- Safer cutover
Cons:
- Complex
- Requires identical schema
- Triggers behave differently
Strategy 4: Physical Replication
- Use pg_basebackup
- Copy WAL files
- Promote replica
Fastest but risky across clouds and versions.
The “One Wrong Command” Problem (At Scale)
At TB scale, mistakes are not easily reversible.
Examples:
pg_restore --clean
→ Drops objects while application is live.
pg_restore -j 32
→ Can overload disks and crash the server.
pg_dump --disable-triggers
→ Breaks audit systems.
ALTER SYSTEM SET synchronous_commit = off;
→ Silent data loss during crash.
SET maintenance_work_mem = '64MB';
→ Index creation takes days instead of hours.
Configuration That Can Break Large Restores
Critical settings during restore:
maintenance_work_mem
work_mem
shared_buffers
wal_buffers
checkpoint_timeout
max_wal_size
synchronous_commit
fsync
autovacuum
Bad tuning leads to:
- WAL explosion
- Disk full
- Checkpoint storms
- Table corruption
- Restore freezing at 99%
pg_dump Format Decision Matrix (TB Scale)
Recommended:
pg_dump -Fd -j 16 --no-owner --no-acl -f dumpdir dbname
Restore:
pg_restore -Fd -j 16 -d target_db dumpdir
Data Integrity Verification at Massive Scale
Row counts are not enough.
Also verify:
- Checksums
- Hash totals
- Min/max timestamps
- Foreign key violations
- Application-level aggregates
Example:
SELECT SUM(amount) FROM transactions;
Compare source vs target.
Operational Reality
Large migrations require:
- Dry runs
- Rollback plan
- Monitoring dashboards
- Disk usage alerts
- WAL growth alerts
- CPU saturation alerts
And one truth:
You never migrate once.
You rehearse many times.
Architecture Diagrams (Reference Patterns)
Below are practical reference architectures commonly used for large-scale PostgreSQL migrations.
1. Simple Offline Migration (Full Downtime)
┌──────────────┐ pg_dump / pg_restore ┌──────────────┐
│ Source DB │ ─────────────────────────────▶ │ Target DB │
│ (Cloud A) │ │ (Cloud B) │
└──────────────┘ └──────────────┘
▲ ▲
│ │
└────────────── Application Down ──────────────────┘
Use when:
- Internal tools
- Small datasets
- Downtime acceptable
2. Schema-First + Batch Data Migration
Step 1: Schema
┌──────────────┐ pg_dump --schema-only ┌──────────────┐
│ Source DB │ ───────────────────────▶ │ Target DB │
└──────────────┘ └──────────────┘
Step 2: Data (in chunks)
┌──────────────┐ Table batches / jobs ┌──────────────┐
│ Source DB │ ───────────────────────▶ │ Target DB │
└──────────────┘ └──────────────┘
Step 3: Indexes + Triggers
Use when:
- Medium–large DB
- Partial downtime allowed
- Need more control
3. Logical Replication (Near Zero Downtime)
Initial Snapshot + WAL Streaming
┌──────────────┐ ───────────────────────────────────────▶ ┌──────────────┐
│ Source DB │ │ Target DB │
│ (Publisher) │ │ (Subscriber) │
└──────────────┘ └──────────────┘
▲ ▲
│ │
│ Replication Slot │
└─────────────────────────────────────────────────────────┘
Traffic Cutover
┌──────────────┐ ┌──────────────┐
│ Application │ ───────────── switch ───────────▶│ Target DB │
└──────────────┘ └──────────────┘
Use when:
- Production systems
- Billions of rows
- Downtime must be minimal
4. Physical Replication (pg_basebackup)
┌──────────────┐ Base Backup + WAL Files ┌──────────────┐
│ Source DB │ ───────────────────────────────▶ │ Replica DB │
│ (Primary) │ │ (Cloud B) │
└──────────────┘ └──────────────┘
│
▼
Promote to Primary
Use when:
- Same PostgreSQL version
- Compatible storage
- Very large datasets
- Advanced DBA team
5. Enterprise-Grade Migration with Staging Layer
┌─────────────────┐
│ Monitoring & │
│ Validation │
└────────┬────────┘
│
▼
┌──────────────┐ Logical Replication ┌──────────────┐
│ Source DB │ ─────────────────────▶ │ Staging DB │
└──────────────┘ └──────────────┘
│
│ Validation + Index build
▼
┌──────────────┐
│ Target DB │
└──────────────┘
▲
│
Application Cutover
Use when:
- Mission-critical systems
- Regulatory requirements
- Very large datasets
- Strict validation needs
Key Notes for All Architectures
- Always separate schema migration from data migration
- Never build heavy indexes before loading data
- Always validate before traffic cutover
- Monitor WAL growth and disk usage continuously
- Keep rollback architecture ready
Final Thoughts
Cloud database migration at TB scale is not DevOps work.
It is:
- Distributed systems engineering
- Data architecture
- Capacity planning
- Risk management
- Incident prevention
Treat it like a spacecraft launch.
Because at this scale…
one wrong command doesn’t cause a bug.
It causes an outage.
A data incident.
Or a company-level crisis.
By Jatin Jain Saraf on January 11, 2026.