Skip to content

Commit 970f94b

Browse files
committed
Initial commit
0 parents  commit 970f94b

33 files changed

+2407
-0
lines changed

.gitignore

+62
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# Logs and output
2+
cdc_output/*
3+
!cdc_output/.gitkeep
4+
5+
# Environment files
6+
.env
7+
*.env
8+
.env.*
9+
10+
# Config files with secrets
11+
configs/userlist.txt
12+
configs/*local*.yaml
13+
configs/*local*.ini
14+
15+
# Docker
16+
docker/*local*
17+
.docker
18+
19+
# Dependencies
20+
elixir_app/_build/
21+
elixir_app/deps/
22+
elixir_app/.elixir_ls/
23+
elixir_app/cover/
24+
elixir_app/doc/
25+
26+
# Erlang
27+
*.beam
28+
*.plt
29+
erl_crash.dump
30+
31+
# Generated files
32+
*.pyc
33+
*.pyo
34+
*.pyd
35+
__pycache__/
36+
*.so
37+
*.dylib
38+
39+
# OS generated files
40+
.DS_Store
41+
.DS_Store?
42+
._*
43+
.Spotlight-V100
44+
.Trashes
45+
ehthumbs.db
46+
Thumbs.db
47+
48+
# Editor directories and files
49+
.idea/
50+
.vscode/
51+
*.swp
52+
*.swo
53+
*~
54+
55+
# Temporary files
56+
*.log
57+
*.tmp
58+
*.temp
59+
*.pid
60+
61+
# Keep directories
62+
!.gitkeep

README.md

+231
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,231 @@
1+
# Zero-Downtime Database Switchover Tool
2+
3+
Zero-downtime PostgreSQL database switchover with physical replicas and Debezium CDC connectors support. Handles read-only and read-write traffic separately via PgBouncer for minimal application disruption.
4+
5+
## Features
6+
7+
- Zero-downtime switchover using PgBouncer
8+
- Multiple physical replicas support
9+
- Debezium CDC connector management
10+
- Replication lag monitoring
11+
- Sequence synchronization
12+
- Separate read-only/read-write traffic handling
13+
- Forward and reverse switchover
14+
- Test environment with sample Elixir app
15+
- Continuous CDC monitoring
16+
- Read/write load simulation
17+
18+
## Prerequisites
19+
20+
- Docker and Docker Compose
21+
- PostgreSQL with logical replication
22+
- PgBouncer
23+
- Kafka Connect with Debezium
24+
- `yq` tool for YAML processing
25+
- `psql` client
26+
- Elixir (for test app)
27+
28+
## Project Structure
29+
30+
```
31+
db_upgrade_setup/
32+
├── cdc_output/ # CDC and test load output logs
33+
├── configs/
34+
│ ├── pgbouncer.ini # PgBouncer configuration
35+
│ ├── switchover-config.yaml # Main switchover configuration
36+
│ └── userlist.txt # PgBouncer user credentials
37+
├── docker/ # Docker-related files
38+
├── elixir_app/ # Sample Elixir application for testing
39+
├── lib/ # Core library functions
40+
│ ├── config.sh
41+
│ ├── debezium.sh
42+
│ ├── error_handler.sh
43+
│ ├── health_checks.sh
44+
│ ├── logging.sh
45+
│ ├── pgbouncer.sh
46+
│ ├── replication.sh
47+
│ └── sequences.sh
48+
└── scripts/
49+
├── continuous-cdc.sh # CDC monitoring script
50+
├── copy_db.sh # Database copy utility
51+
├── init_replication.sh # Replication setup
52+
├── seeds.sh # Database seeding
53+
├── switchover.sh # Main switchover script
54+
├── test-setup.sh # Test environment setup
55+
├── upgrade_pg.sh # Database upgrade script
56+
└── run.sh # Main wrapper script
57+
```
58+
59+
## Configuration
60+
61+
Configuration file `switchover-config.yaml`:
62+
63+
```yaml
64+
source:
65+
internal_name: postgres_source
66+
host: localhost
67+
port: 5433
68+
replicas:
69+
- name: postgres_source_ro_1
70+
- name: postgres_source_ro_2
71+
72+
target:
73+
internal_name: postgres_target
74+
host: localhost
75+
port: 5434
76+
replicas:
77+
- name: postgres_target_ro_1
78+
- name: postgres_target_ro_2
79+
80+
database:
81+
name: testdb
82+
user: testuser
83+
password: testpass
84+
85+
connectors:
86+
- name: postgres-connector
87+
slot_name: debezium
88+
publication_name: dbz_publication
89+
90+
pgbouncer:
91+
config_file: pgbouncer.ini
92+
admin_port: 6433
93+
admin_user: testuser
94+
admin_password: testpass
95+
admin_database: pgbouncer
96+
pools:
97+
read_write:
98+
name: testdb_rw
99+
read_only:
100+
name: testdb_ro
101+
102+
kafka:
103+
connect_clusters:
104+
- name: cdc
105+
url: http://localhost:8083
106+
- name: outbox
107+
url: http://localhost:8084
108+
109+
replication:
110+
max_lag_bytes: 20000
111+
catchup_timeout: 60
112+
sync_sequences_gap: 100000
113+
```
114+
115+
## Quick Start
116+
117+
### Main Commands
118+
119+
```bash
120+
./run.sh <command> [options]
121+
122+
Commands:
123+
switchover # Run database switchover
124+
test-setup # Set up test environment
125+
help # Show help message
126+
```
127+
128+
### Test Environment Setup
129+
130+
```bash
131+
# Full setup with sample app and CDC monitoring
132+
./run.sh test-setup --full
133+
134+
# Clean up everything
135+
./run.sh test-setup --destroy
136+
137+
# Restart only target database
138+
./run.sh test-setup --restart-target
139+
```
140+
141+
The `--full` setup:
142+
143+
1. Starts Docker containers (databases, Kafka)
144+
2. Seeds database with test data
145+
3. Launches Elixir test application
146+
4. Starts CDC monitoring
147+
5. Sets up read/write load simulation
148+
6. Initializes logging
149+
150+
### Running Switchover
151+
152+
```bash
153+
./run.sh switchover [options]
154+
155+
Options:
156+
-d, --direction # forward|reverse
157+
-m, --mode # full|readonly (default: full)
158+
-dbz, --debezium-mode # catchup|no-wait (default: no-wait)
159+
-c, --config # config file path
160+
-h, --help # show help
161+
```
162+
163+
## Switchover Modes
164+
165+
### Full Switchover (`-m full`)
166+
167+
- Switches all traffic
168+
- Manages replication slots and CDC
169+
- Syncs sequences
170+
171+
### Read-only Switchover (`-m readonly`)
172+
173+
- Switches only read traffic
174+
- For testing/gradual migration
175+
- No write impact
176+
177+
## Monitoring
178+
179+
### CDC Output
180+
181+
All monitoring data in `cdc_output/`:
182+
- CDC events
183+
- Read/write load stats
184+
- App metrics
185+
- Replication status
186+
187+
### Test Application
188+
189+
Sample Elixir app helps verify:
190+
- Connection handling
191+
- Data consistency
192+
- Performance impact
193+
194+
## Cleanup
195+
196+
```bash
197+
./run.sh test-setup --destroy
198+
```
199+
200+
Stops:
201+
- Docker containers
202+
- CDC monitoring
203+
- Load simulation
204+
- Test application
205+
206+
## Troubleshooting
207+
208+
### Common Issues
209+
210+
**High Replication Lag**
211+
- Check `cdc_output` logs
212+
- Adjust `max_lag_bytes`
213+
- Verify load settings
214+
215+
**CDC Issues**
216+
- Check `continuous-cdc.sh` logs
217+
- Verify Kafka Connect
218+
- Check Debezium config
219+
220+
**App Errors**
221+
- Check Elixir app logs
222+
- Verify connectivity
223+
- Check PgBouncer status
224+
225+
## Support
226+
227+
For issues provide:
228+
- Error messages
229+
- Relevant logs
230+
- Reproduction steps
231+
- Environment details

configs/pgbouncer.ini

+26
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
[databases]
2+
testdb_rw = host=postgres_source port=5432 auth_user=testuser dbname=testdb
3+
testdb_ro = host=postgres_source_ro_1,postgres_source_ro_2 port=5432 auth_user=testuser dbname=testdb
4+
5+
[pgbouncer]
6+
listen_addr = 0.0.0.0
7+
listen_port = 6432
8+
unix_socket_dir =
9+
user = postgres
10+
auth_file = /etc/pgbouncer/userlist.txt
11+
auth_type = trust
12+
pool_mode = transaction
13+
max_client_conn = 1000
14+
default_pool_size = 100
15+
ignore_startup_parameters = extra_float_digits
16+
server_round_robin = 1
17+
max_prepared_statements = 1000
18+
19+
# Log settings
20+
admin_users = testuser
21+
22+
# Connection sanity checks, timeouts
23+
24+
# TLS settings
25+
26+
# Dangerous timeouts

configs/switchover-config.yaml

+44
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# switchover-config.yaml
2+
source:
3+
internal_name: postgres_source
4+
host: localhost
5+
port: 5433
6+
replicas:
7+
- name: postgres_source_ro_1
8+
- name: postgres_source_ro_2
9+
target:
10+
internal_name: postgres_target
11+
host: localhost
12+
port: 5434
13+
replicas:
14+
- name: postgres_target_ro_1
15+
- name: postgres_target_ro_2
16+
database:
17+
name: testdb
18+
user: testuser
19+
password: testpass
20+
connectors:
21+
- name: postgres-connector
22+
slot_name: debezium
23+
publication_name: dbz_publication
24+
pgbouncer:
25+
config_file: pgbouncer.ini
26+
admin_port: 6433
27+
admin_user: testuser # PgBouncer admin user
28+
admin_password: testpass # PgBouncer admin password
29+
admin_database: pgbouncer # PgBouncer admin database name
30+
pools:
31+
read_write:
32+
name: testdb_rw
33+
read_only:
34+
name: testdb_ro
35+
kafka:
36+
connect_clusters:
37+
- name: cdc
38+
url: http://localhost:8083
39+
- name: outbox
40+
url: http://localhost:8084
41+
replication:
42+
max_lag_bytes: 20000 # Maximum acceptable lag in bytes
43+
catchup_timeout: 60 # Maximum time to wait for catchup in seconds
44+
sync_sequences_gap: 100000 # Gap to maintain between sequences after switch

0 commit comments

Comments
 (0)