Cloud‑to‑Cloud Database Migration: A Practical Guide for Large PostgreSQL Systems

Migrating infrastructure from one cloud provider to another is already a complex process. But migrating a large, production PostgreSQL…

Cloud‑to‑Cloud Database Migration: A Practical Guide for Large PostgreSQL Systems

Migrating infrastructure from one cloud provider to another is already a complex process. But migrating a large, production PostgreSQL database, with schemas, constraints, triggers, partitions, indexes, extensions, and years of operational history, is an entirely different level of engineering.

This article breaks down the real‑world challenges, decisions, risks, and best practices involved in such migrations.

Why Database Migration Is Uniquely Hard

Applications are portable.

Containers can be rebuilt.

CI pipelines can be recreated.

But databases are:

Stateful
Performance‑sensitive
Integrity‑critical
Operationally fragile

A single wrong command can:

Drop constraints
Disable triggers
Break partitions
Corrupt encodings
Destroy indexes
Or silently degrade performance

Cloud database migration is not data transfer.

It is data engineering + systems engineering + risk management.

Common Use Cases for Cloud‑to‑Cloud Migration

Teams usually migrate because of:

Cost optimization (AWS → GCP / GCP → DO)
Vendor lock‑in concerns
Regulatory or data‑residency requirements
Performance issues
Infrastructure consolidation
Reliability problems

Each use case affects the migration strategy.

Example:

Cost‑driven migration → minimize downtime
Compliance migration → prioritize auditability
Performance migration → re‑design indexes & partitions

What Actually Needs to Be Migrated (Beyond Tables)

Most failures happen because teams think only about data rows. In reality you must migrate:

1. Schemas

Namespaces
Ownership
Permissions

2. Tables

Data types
Defaults
NOT NULL constraints

3. Indexes

B‑Tree / GIN / GiST / BRIN
Partial indexes
Expression indexes

4. Constraints

Primary keys
Foreign keys
Unique constraints
Check constraints

5. Partitions

Range / list / hash partitions
Partition constraints
Parent‑child table relationships

6. Triggers

Audit triggers
Business logic triggers
Replication triggers

7. Functions & Procedures

PL/pgSQL
Extensions

8. Extensions

PostGIS
pgcrypto
uuid‑ossp
citext
TimescaleDB

9. Migration history

Flyway / Liquibase tables
Schema version tracking

Missing any one of these can break your system in subtle ways.

Strategic Decisions Before You Touch Any Command

1. Downtime vs Live Migration

Options:

Full downtime migration
Read‑only window
Dual‑write period
Logical replication

Each increases complexity exponentially.

2. Database Engine Parity

Verify:

PostgreSQL major version
Extensions availability
Default configs
Collation & locale

Even small version mismatches can break indexes or functions.

3. Data Volume

4. Security Model

Users
Roles
Grants
Secrets

Cloud providers handle IAM very differently.

pg_dump & pg_restore: Where Most Migrations Fail

Choosing the Correct Dump Format

Recommended Commands

Schema only

pg_dump -Fc --schema-only -f schema.dump dbname

Data only

pg_dump -Fc --data-only -f data.dump dbname

Full dump

pg_dump -Fd -j 8 -f dumpdir dbname

Restore Order (Critical)

Roles & users
Schemas
Extensions
Tables
Data
Indexes
Constraints
Triggers
Validation

Wrong order = broken foreign keys or missing triggers.

The Dangerous Commands (Real Production Killers)

pg_dumpall > backup.sql # huge, slow, unsafe

pg_restore --clean # may DROP critical objects

DROP SCHEMA public CASCADE;

SET session_replication_role = replica; # disables triggers

One wrong flag → irreversible damage.

Configuration Differences That Break Systems

Cloud providers ship different defaults:

work_mem
shared_buffers
wal_level
max_connections
autovacuum settings
synchronous_commit

These affect:

Query latency
Index usage
Deadlocks
Replication stability

Always compare:

SHOW ALL;

Performance Regression After Migration

Common causes:

Missing indexes
Statistics not refreshed
Different query planner
Changed collation
IO throughput differences

Fix with:

ANALYZE;
REINDEX DATABASE dbname;

Validation Checklist (Non‑Negotiable)

Structure

SELECT count(*) FROM information_schema.tables;

Triggers

SELECT * FROM information_schema.triggers;

Indexes

SELECT * FROM pg_indexes;

Extensions

SELECT * FROM pg_extension;

Row counts

Compare source vs target.

Application tests

Run full regression tests.

Real Risks Nobody Talks About

Silent data truncation
Encoding mismatch
Timezone corruption
Floating‑point drift
Broken audit trails
Lost permissions

Most of these are detected weeks later in production.

When Your Database Is in Terabytes and Billions of Rows

At small scale, mistakes are painful.

At terabyte scale, mistakes are catastrophic.

When your database contains:

Multiple TB of data
Billions of rows
Hundreds of tables
Thousands of indexes
Complex partitions
Heavy write traffic

…migration becomes a distributed systems problem, not just a database task.

Additional Challenges at TB Scale

1. Network Throughput Becomes the Bottleneck

Cross-cloud bandwidth limits
Packet loss
Throttling
NAT saturation

Even at 1 Gbps:

1 TB ≈ 2.5 hours (ideal conditions)

5 TB ≈ 12+ hours

10 TB ≈ 24+ hours

Real-world speeds are usually 40–60% of theoretical.

2. Storage IOPS Decide Success or Failure

Cloud disks differ massively:

GCP PD-SSD
AWS gp3 / io2
DigitalOcean volumes

Low IOPS =

Slow restores
Checkpoint storms
Autovacuum backlog
Index build failures

Always benchmark disk write speed before restore.

3. pg_dump Stops Being Enough

For very large databases:

pg_dump becomes slow
Single-threaded metadata phase blocks
Restore time explodes

Better approaches:

pg_dump in directory format with high parallelism
Logical replication
Chunked table migration
Physical replication (pg_basebackup)

Migration Strategies for Massive Databases

Strategy 1: Full Downtime (Rarely Acceptable)

Stop writes
Dump everything
Restore
Start application

Works only for internal systems or very tolerant products.

Strategy 2: Schema First + Batch Data Migration

Dump schema
Create structure on target
Migrate tables in chunks
Build indexes later

Reduces risk but complex to orchestrate.

Strategy 3: Logical Replication (Most Common)

Setup publisher on source
Setup subscriber on target
Sync initial snapshot
Catch up changes
Switch traffic

Pros:

Near zero downtime
Safer cutover

Cons:

Complex
Requires identical schema
Triggers behave differently

Strategy 4: Physical Replication

Use pg_basebackup
Copy WAL files
Promote replica

Fastest but risky across clouds and versions.

The “One Wrong Command” Problem (At Scale)

At TB scale, mistakes are not easily reversible.

Examples:

pg_restore --clean

→ Drops objects while application is live.

pg_restore -j 32

→ Can overload disks and crash the server.

pg_dump --disable-triggers

→ Breaks audit systems.

ALTER SYSTEM SET synchronous_commit = off;

→ Silent data loss during crash.

SET maintenance_work_mem = '64MB';

→ Index creation takes days instead of hours.

Configuration That Can Break Large Restores

Critical settings during restore:

maintenance_work_mem
work_mem
shared_buffers
wal_buffers
checkpoint_timeout
max_wal_size
synchronous_commit
fsync
autovacuum

Bad tuning leads to:

WAL explosion
Disk full
Checkpoint storms
Table corruption
Restore freezing at 99%

pg_dump Format Decision Matrix (TB Scale)

Recommended:

pg_dump -Fd -j 16 --no-owner --no-acl -f dumpdir dbname

Restore:

pg_restore -Fd -j 16 -d target_db dumpdir

Data Integrity Verification at Massive Scale

Row counts are not enough.

Also verify:

Checksums
Hash totals
Min/max timestamps
Foreign key violations
Application-level aggregates

Example:

SELECT SUM(amount) FROM transactions;

Compare source vs target.

Operational Reality

Large migrations require:

Dry runs
Rollback plan
Monitoring dashboards
Disk usage alerts
WAL growth alerts
CPU saturation alerts

And one truth:

You never migrate once.

You rehearse many times.

Architecture Diagrams (Reference Patterns)

Below are practical reference architectures commonly used for large-scale PostgreSQL migrations.

1. Simple Offline Migration (Full Downtime)

┌──────────────┐ pg_dump / pg_restore ┌──────────────┐
│ Source DB │ ─────────────────────────────▶ │ Target DB │
│ (Cloud A) │ │ (Cloud B) │
└──────────────┘ └──────────────┘
▲ ▲
│ │
└────────────── Application Down ──────────────────┘

Use when:

Internal tools
Small datasets
Downtime acceptable

2. Schema-First + Batch Data Migration

Step 1: Schema
┌──────────────┐ pg_dump --schema-only ┌──────────────┐
│ Source DB │ ───────────────────────▶ │ Target DB │
└──────────────┘ └──────────────┘

      Step 2: Data (in chunks)

┌──────────────┐ Table batches / jobs ┌──────────────┐
│ Source DB │ ───────────────────────▶ │ Target DB │
└──────────────┘ └──────────────┘

      Step 3: Indexes + Triggers

Use when:

Medium–large DB
Partial downtime allowed
Need more control

3. Logical Replication (Near Zero Downtime)

Initial Snapshot + WAL Streaming
┌──────────────┐ ───────────────────────────────────────▶ ┌──────────────┐
│ Source DB │ │ Target DB │
│ (Publisher) │ │ (Subscriber) │
└──────────────┘ └──────────────┘
▲ ▲
│ │
│ Replication Slot │
└─────────────────────────────────────────────────────────┘

             Traffic Cutover

┌──────────────┐ ┌──────────────┐
│ Application │ ───────────── switch ───────────▶│ Target DB │
└──────────────┘ └──────────────┘

Use when:

Production systems
Billions of rows
Downtime must be minimal

4. Physical Replication (pg_basebackup)

┌──────────────┐ Base Backup + WAL Files ┌──────────────┐
│ Source DB │ ───────────────────────────────▶ │ Replica DB │
│ (Primary) │ │ (Cloud B) │
└──────────────┘ └──────────────┘
│
▼
Promote to Primary

Use when:

Same PostgreSQL version
Compatible storage
Very large datasets
Advanced DBA team

5. Enterprise-Grade Migration with Staging Layer

┌─────────────────┐
│ Monitoring & │
│ Validation │
└────────┬────────┘
│
▼
┌──────────────┐ Logical Replication ┌──────────────┐
│ Source DB │ ─────────────────────▶ │ Staging DB │
└──────────────┘ └──────────────┘
│
│ Validation + Index build
▼
┌──────────────┐
│ Target DB │
└──────────────┘
▲
│
Application Cutover

Use when:

Mission-critical systems
Regulatory requirements
Very large datasets
Strict validation needs

Key Notes for All Architectures

Always separate schema migration from data migration
Never build heavy indexes before loading data
Always validate before traffic cutover
Monitor WAL growth and disk usage continuously
Keep rollback architecture ready

Final Thoughts

Cloud database migration at TB scale is not DevOps work.

It is:

Distributed systems engineering
Data architecture
Capacity planning
Risk management
Incident prevention

Treat it like a spacecraft launch.

Because at this scale…

one wrong command doesn’t cause a bug.

It causes an outage.

A data incident.

Or a company-level crisis.

By Jatin Jain Saraf on January 11, 2026.

Cloud‑to‑Cloud Database Migration: A Practical Guide for Large PostgreSQL Systems